从 blastdb 中提取 GI 和 taxid
可以使用 blastdbcmd
从 blastdb 中提取数据,blastdbcmd
应该包含在爆炸装置中。你可以从以下选项中指定 -outfmt
的一部分,包括哪些元数据以及包含的顺序。
从手册页:
-outfmt <String>
Output format, where the available format specifiers are:
%f means sequence in FASTA format
%s means sequence data (without defline)
%a means accession
%g means gi
%o means ordinal id (OID)
%i means sequence id
%t means sequence title
%l means sequence length
%h means sequence hash value
%T means taxid
%X means leaf-node taxids
%e means membership integer
%L means common taxonomic name
%C means common taxonomic names for leaf-node taxids
%S means scientific name
%N means scientific names for leaf-node taxids
%B means BLAST name
%K means taxonomic super kingdom
%P means PIG
示例代码段显示了如何从 blastdb 中提取 gi 和 taxid。所述 NCBI 16SMicrobial (FTP)blastdb 被选择用于本实施例中:
# Example:
# blastdbcmd -db <db label> -entry all -outfmt "%g %T" -out <outfile>
blastdbcmd -db 16SMicrobial -entry all -outfmt "%g %T" -out 16SMicrobial.gi_taxid.tsv
这将生成一个文件 16SMicrobial.gi_taxid.tsv
,如下所示:
939733319 526714
636559958 429001
645319546 629680