Maldonado Emanuel, Almeida Daniela, Escalona Tibisay, Khan Imran, Vasconcelos Vitor, Antunes Agostinho
CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal.
Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal.
BMC Bioinformatics. 2016 Sep 6;17(1):354. doi: 10.1186/s12859-016-1204-5.
Uncovering how phenotypic diversity arises and is maintained in nature has long been a major interest of evolutionary biologists. Recent advances in genome sequencing technologies have remarkably increased the efficiency to pinpoint genes involved in the adaptive evolution of phenotypes. Reliability of such findings is most often examined with statistical and computational methods using Maximum Likelihood codon-based models (i.e., site, branch, branch-site and clade models), such as those available in codeml from the Phylogenetic Analysis by Maximum Likelihood (PAML) package. While these models represent a well-defined workflow for documenting adaptive evolution, in practice they can be challenging for researchers having a vast amount of data, as multiple types of relevant codon-based datasets are generated, making the overall process hard and tedious to handle, error-prone and time-consuming.
We introduce LMAP (Lightweight Multigene Analyses in PAML), a user-friendly command-line and interactive package, designed to handle the codeml workflow, namely: directory organization, execution, results gathering and organization for Likelihood Ratio Test estimations with minimal manual user intervention. LMAP was developed for the workstation multi-core environment and provides a unique advantage for processing one, or more, if not all codeml codon-based models for multiple datasets at a time. Our software, proved efficiency throughout the codeml workflow, including, but not limited, to simultaneously handling more than 20 datasets.
We have developed a simple and versatile LMAP package, with outstanding performance, enabling researchers to analyze multiple different codon-based datasets in a high-throughput fashion. At minimum, two file types are required within a single input directory: one for the multiple sequence alignment and another for the phylogenetic tree. To our knowledge, no other software combines all codeml codon substitution models of adaptive evolution. LMAP has been developed as an open-source package, allowing its integration into more complex open-source bioinformatics pipelines. LMAP package is released under GPLv3 license and is freely available at http://lmapaml.sourceforge.net/ .
揭示表型多样性如何在自然界中产生并得以维持,长期以来一直是进化生物学家的主要研究兴趣所在。基因组测序技术的最新进展显著提高了确定参与表型适应性进化的基因的效率。此类发现的可靠性通常使用基于最大似然密码子的模型(即位点、分支、分支位点和分支模型),通过统计和计算方法进行检验,例如最大似然系统发育分析(PAML)软件包中的codeml程序所提供的模型。虽然这些模型代表了记录适应性进化的明确工作流程,但在实际应用中,对于拥有大量数据的研究人员来说可能具有挑战性,因为会生成多种类型的基于密码子的相关数据集,使得整个过程处理起来困难且繁琐,容易出错且耗时。
我们引入了LMAP(PAML中的轻量级多基因分析),这是一个用户友好的命令行和交互式软件包,旨在处理codeml工作流程,即:目录组织、执行、结果收集以及以最小的用户手动干预进行似然比检验估计的组织。LMAP是为工作站多核环境开发的,对于一次处理一个或多个(如果不是全部)基于codeml密码子的模型用于多个数据集具有独特优势。我们的软件在整个codeml工作流程中都证明了其效率,包括但不限于同时处理20多个数据集。
我们开发了一个简单且通用的LMAP软件包,具有出色的性能,使研究人员能够以高通量方式分析多个不同的基于密码子的数据集。在单个输入目录中至少需要两种文件类型:一种用于多序列比对,另一种用于系统发育树。据我们所知,没有其他软件能结合所有用于适应性进化的codeml密码子替换模型。LMAP已作为开源软件包开发,允许其集成到更复杂的开源生物信息学管道中。LMAP软件包根据GPLv3许可发布,可在http://lmapaml.sourceforge.net/免费获取。