Microbial Evolution Research Group, Department of Biology, University of Oslo, Norway.
BMC Bioinformatics. 2009 Oct 28;10:357. doi: 10.1186/1471-2105-10-357.
Large multigene sequence alignments have over recent years been increasingly employed for phylogenomic reconstruction of the eukaryote tree of life. Such supermatrices of sequence data are preferred over single gene alignments as they contain vastly more information about ancient sequence characteristics, and are thus more suitable for resolving deeply diverging relationships. However, as alignments are expanded, increasingly numbers of sites with misleading phylogenetic information are also added. Therefore, a major goal in phylogenomic analyses is to maximize the ratio of information to noise; this can be achieved by the reduction of fast evolving sites.
Here we present a batch-oriented web-based program package, named AIR that allows 1) transformation of several single genes to one multigene alignment, 2) identification of evolutionary rates in multigene alignments and 3) removal of fast evolving sites. These three processes can be done with the programs AIR-Appender, AIR-Identifier, and AIR-Remover (AIR), which can be used independently or in a semi-automated pipeline. AIR produces user-friendly output files with filtered and non-filtered alignments where residues are colored according to their evolutionary rates. Other bioinformatics applications linked to the AIR package are available at the Bioportal http://www.bioportal.uio.no, University of Oslo; together these greatly improve the flexibility, efficiency and quality of phylogenomic analyses.
The AIR program package allows for efficient creation of multigene alignments and better assessment of evolutionary rates in sequence alignments. Removing fast evolving sites with the AIR programs has been employed in several recent phylogenomic analyses resulting in improved phylogenetic resolution and increased statistical support for branching patterns among the early diverging eukaryotes.
近年来,大型多基因序列比对越来越多地被用于真核生物生命树的系统基因组重建。与单基因比对相比,这些序列数据的超级矩阵包含了更多关于古老序列特征的信息,因此更适合解决深度分歧的关系。然而,随着比对的扩展,越来越多具有误导性进化信息的位点也被添加进来。因此,系统基因组分析的主要目标是最大化信息与噪声的比率;这可以通过减少快速进化的位点来实现。
这里我们提出了一个基于网络的批处理程序包,名为 AIR,它允许 1)将多个单基因转换为一个多基因比对,2)识别多基因比对中的进化率,3)去除快速进化的位点。这三个过程可以使用 AIR-Appender、AIR-Identifier 和 AIR-Remover(AIR)程序来完成,这些程序可以独立使用,也可以在半自动化的管道中使用。AIR 生成带有过滤和非过滤比对的用户友好的输出文件,其中残基根据其进化率着色。与 AIR 程序包相关的其他生物信息学应用程序可在生物门户 http://www.bioportal.uio.no 上获得,该门户位于奥斯陆大学;这些应用程序共同极大地提高了系统基因组分析的灵活性、效率和质量。
AIR 程序包允许高效地创建多基因比对,并更好地评估序列比对中的进化率。使用 AIR 程序去除快速进化的位点已被用于最近的几个系统基因组分析中,从而提高了分支模式的系统发育分辨率和统计支持,这些分支模式在早期分化的真核生物中。