從 blastdb 中提取 GI 和 taxid
可以使用 blastdbcmd
從 blastdb 中提取資料,blastdbcmd
應該包含在爆炸裝置中。你可以從以下選項中指定 -outfmt
的一部分,包括哪些後設資料以及包含的順序。
從手冊頁:
-outfmt <String>
Output format, where the available format specifiers are:
%f means sequence in FASTA format
%s means sequence data (without defline)
%a means accession
%g means gi
%o means ordinal id (OID)
%i means sequence id
%t means sequence title
%l means sequence length
%h means sequence hash value
%T means taxid
%X means leaf-node taxids
%e means membership integer
%L means common taxonomic name
%C means common taxonomic names for leaf-node taxids
%S means scientific name
%N means scientific names for leaf-node taxids
%B means BLAST name
%K means taxonomic super kingdom
%P means PIG
示例程式碼段顯示瞭如何從 blastdb 中提取 gi 和 taxid。所述 NCBI 16SMicrobial (FTP)blastdb 被選擇用於本實施例中:
# Example:
# blastdbcmd -db <db label> -entry all -outfmt "%g %T" -out <outfile>
blastdbcmd -db 16SMicrobial -entry all -outfmt "%g %T" -out 16SMicrobial.gi_taxid.tsv
這將生成一個檔案 16SMicrobial.gi_taxid.tsv
,如下所示:
939733319 526714
636559958 429001
645319546 629680