Borowiec Marek L
Department of Entomology and Nematology, UC Davis , Davis , United States.
PeerJ. 2016 Jan 28;4:e1660. doi: 10.7717/peerj.1660. eCollection 2016.
The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, creation of replicate data sets, and removal of taxa. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It is computationally efficient, utilizes parallel processing, and performs better at concatenation than other popular tools. AMAS is a Python 3 program that relies solely on Python's core modules and needs no additional dependencies. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/ under GNU General Public License.
近年来,系统发育学中使用的数据量呈爆炸式增长,许多系统发育树是通过数百个甚至数千个基因座和众多分类群推断出来的。这些现代系统发育基因组学研究除了对基因子集或串联序列进行多次分析外,通常还需要对每个基因座进行单独分析。因此,需要计算效率高的工具来处理和计算数千个单基因座或大型串联比对的属性。在此,我介绍AMAS(比对操作与总结),这是一个既可以作为独立的命令行实用程序使用,也可以作为Python包使用的工具。AMAS适用于氨基酸和核苷酸比对,并将序列操作功能与计算基本统计量的功能结合在一起。操作功能包括在流行格式之间进行转换、串联、提取位点以及根据预定义的划分方案进行拆分、创建重复数据集和去除分类群。计算的统计量包括分类群数量、比对长度、矩阵单元格总数、未确定字符的总数、缺失数据的百分比、AT和GC含量(对于DNA比对)、可变位点的数量和比例、简约信息位点的数量和比例,以及与核苷酸或氨基酸字母表相关的所有字符的数量。AMAS特别适用于具有数百个分类群和数千个基因座的非常大的比对。它计算效率高,利用并行处理,并且在串联方面比其他流行工具表现更好。AMAS是一个仅依赖Python核心模块的Python 3程序,无需其他依赖项。AMAS的源代码和手册可在GNU通用公共许可证下从http://github.com/marekborowiec/AMAS/下载。