Goloboff Pablo A, Catalano Santiago A
Consejo Nacional de Investigaciones Científicas y Técnicas, Miguel Lillo 205, 4000 S.M. de Tucumán, Argentina.
Instituto Miguel Lillo, Facultad de Ciencias Naturales, Miguel Lillo 205, 4000 S.M. de Tucumán, Argentina.
Cladistics. 2012 Oct;28(5):503-513. doi: 10.1111/j.1096-0031.2012.00400.x. Epub 2012 May 4.
This paper presents a pipeline, implemented in an open-source program called GB→TNT (GenBank-to-TNT), for creating large molecular matrices, starting from GenBank files and finishing with TNT matrices which incorporate taxonomic information in the terminal names. GB→TNT is designed to retrieve a defined genomic region from a bulk of sequences included in a GenBank file. The user defines the genomic region to be retrieved and several filters (genome, length of the sequence, taxonomic group, etc.); each genomic region represents a different data block in the final TNT matrix. GB→TNT first generates Fasta files from the input GenBank files, then creates an alignment for each of those (by calling an alignment program), and finally merges all the aligned files into a single TNT matrix. The new version of TNT can make use of the taxonomic information contained in the terminal names, allowing easy diagnosis of results, evaluation of fit between the trees and the taxonomy, and automatic labelling or colouring of tree branches with the taxonomic groups they represent. © The Willi Hennig Society 2012.
本文介绍了一种通过名为GB→TNT(从基因库到TNT)的开源程序实现的流程,用于创建大型分子矩阵,该流程从GenBank文件开始,以在终端名称中纳入分类信息的TNT矩阵结束。GB→TNT旨在从GenBank文件中包含的大量序列中检索定义的基因组区域。用户定义要检索的基因组区域和几个过滤器(基因组、序列长度、分类组等);每个基因组区域在最终的TNT矩阵中代表一个不同的数据块。GB→TNT首先从输入的GenBank文件生成Fasta文件,然后为每个文件创建一个比对(通过调用一个比对程序),最后将所有比对文件合并成一个单一的TNT矩阵。TNT的新版本可以利用终端名称中包含的分类信息,便于结果诊断、评估树与分类法之间的拟合度,以及用它们所代表的分类组自动标记或给树枝上色。© 威利·亨尼希协会2012年。