Vaidya Gaurav, Lohman David J, Meier Rudolf
Department of Biological Sciences.
University Scholars Programme, National University of Singapore, 14 Science Drive 4, Singapore 117543, Singapore.
Cladistics. 2011 Apr;27(2):171-180. doi: 10.1111/j.1096-0031.2010.00329.x.
We present SequenceMatrix, software that is designed to facilitate the assembly and analysis of multi-gene datasets. Genes are concatenated by dragging and dropping FASTA, NEXUS, or TNT files with aligned sequences into the program window. A multi-gene dataset is concatenated and displayed in a spreadsheet; each sequence is represented by a cell that provides information on sequence length, number of indels, the number of ambiguous bases ("Ns"), and the availability of codon information. Alternatively, GenBank numbers for the sequences can be displayed and exported. Matrices with hundreds of genes and taxa can be concatenated within minutes and exported in TNT, NEXUS, or PHYLIP formats, preserving both character set and codon information for TNT and NEXUS files. SequenceMatrix also creates taxon sets listing taxa with a minimum number of characters or gene fragments, which helps assess preliminary datasets. Entire taxa, whole gene fragments, or individual sequences for a particular gene and species can be excluded from export. Data matrices can be re-split into their component genes and the gene fragments can be exported as individual gene files. SequenceMatrix also includes two tools that help to identify sequences that may have been compromised through laboratory contamination or data management error. One tool lists identical or near-identical sequences within genes, while the other compares the pairwise distance pattern of one gene against the pattern for all remaining genes combined. SequenceMatrix is Java-based and compatible with the Microsoft Windows, Apple MacOS X and Linux operating systems. The software is freely available from http://code.google.com/p/sequencematrix/. © The Willi Hennig Society 2010.
我们展示了SequenceMatrix软件,该软件旨在促进多基因数据集的组装和分析。通过将带有比对序列的FASTA、NEXUS或TNT文件拖放到程序窗口中,可将基因连接起来。多基因数据集被连接并显示在电子表格中;每个序列由一个单元格表示,该单元格提供有关序列长度、插入缺失数量、模糊碱基(“N”)数量以及密码子信息可用性的信息。或者,可以显示和导出序列的GenBank编号。包含数百个基因和分类单元的矩阵可在几分钟内连接起来,并以TNT、NEXUS或PHYLIP格式导出,同时保留TNT和NEXUS文件的字符集和密码子信息。SequenceMatrix还会创建分类单元集,列出具有最少字符数或基因片段的分类单元,这有助于评估初步数据集。可以从导出中排除整个分类单元、整个基因片段或特定基因和物种的单个序列。数据矩阵可以重新拆分为其组成基因,并且基因片段可以作为单个基因文件导出。SequenceMatrix还包括两个工具,有助于识别可能因实验室污染或数据管理错误而受到影响的序列。一个工具列出基因内相同或近乎相同的序列,而另一个工具则将一个基因的成对距离模式与所有其余基因组合的模式进行比较。SequenceMatrix基于Java,与Microsoft Windows、Apple MacOS X和Linux操作系统兼容。该软件可从http://code.google.com/p/sequencematrix/免费获取。© 威利·亨尼希协会2010年。