Bininda-Emonds Olaf R P
Lehrstuhl für Tierzucht, Technical University of Munich, Hochfeldweg 1, 85354 Freising-Weihenstephan, Germany.
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis. However, multiple alignment represents a computationally difficult problem. For protein-coding DNA sequences, it is more advantageous in terms of both speed and accuracy to align the amino-acid sequences specified by the DNA sequences rather than the DNA sequences themselves. Many implementations making use of this concept of "translated alignments" are incomplete in the sense that they require the user to manually translate the DNA sequences and to perform the amino-acid alignment. As such, they are not well suited to large-scale automated alignments of large and/or numerous DNA data sets.
transAlign is an open-source Perl script that aligns protein-coding DNA sequences via their amino-acid translations to take advantage of the superior multiple-alignment capabilities and speed of an amino-acid alignment. It operates by translating each DNA sequence into its corresponding amino-acid sequence, passing the entire matrix to ClustalW for alignment, and then back-translating the resulting amino-acid alignment to derive the aligned DNA sequences. In the translation step, transAlign determines the optimal orientation and reading frame for each DNA sequence according to the desired genetic code. It also checks for apparent frame shifts in the DNA sequences and can handle frame-shifted sequences in one of three ways (delete, align as amino acids regardless, or profile align as DNA). As a set of comparative benchmarks derived from six protein-coding genes for mammals shows, the strategy implemented in transAlign always improves the speed and usually the apparent accuracy of the alignment of protein-coding DNA sequences.
transAlign represents one of few full and cross-platform implementations of the concept of translated alignments. Both the advantages accruing from performing a translated alignment and the suite of user-definable options available in the program mean that transAlign is ideally suited for large-scale automated alignments of very large and/or very numerous protein-coding DNA data sets. However, the good performance offered by the program also translates to the alignment of any set of protein-coding sequences. transAlign, including the source code, is freely available at http://www.tierzucht.tum.de/Bininda-Emonds/ (under "Programs").
同源DNA序列的比对对于比较基因组学和系统发育分析至关重要。然而,多重比对是一个计算难题。对于蛋白质编码DNA序列,比对由DNA序列指定的氨基酸序列在速度和准确性方面比比对DNA序列本身更具优势。许多利用“翻译比对”这一概念的实现并不完整,因为它们要求用户手动翻译DNA序列并进行氨基酸比对。因此,它们不太适合对大型和/或众多DNA数据集进行大规模自动比对。
transAlign是一个开源的Perl脚本,它通过氨基酸翻译比对蛋白质编码DNA序列,以利用氨基酸比对出色的多重比对能力和速度。它的操作方式是将每个DNA序列翻译成相应的氨基酸序列,将整个矩阵传递给ClustalW进行比对,然后将得到的氨基酸比对结果反向翻译以获得比对后的DNA序列。在翻译步骤中,transAlign根据所需的遗传密码确定每个DNA序列的最佳方向和阅读框。它还会检查DNA序列中明显的移码情况,并可以通过三种方式之一处理移码序列(删除、不管怎样都作为氨基酸比对,或作为DNA进行轮廓比对)。从六个哺乳动物蛋白质编码基因得出的一组比较基准表明,transAlign中实施的策略总能提高蛋白质编码DNA序列比对的速度,通常还能提高明显的准确性。
transAlign是翻译比对概念的少数完整且跨平台的实现之一。进行翻译比对带来的优势以及程序中可用的一系列用户可定义选项意味着transAlign非常适合对非常大的和/或非常多的蛋白质编码DNA数据集进行大规模自动比对。然而,该程序提供的良好性能也适用于任何一组蛋白质编码序列的比对。transAlign包括源代码,可在http://www.tierzucht.tum.de/Bininda-Emonds/(“程序”下)免费获取。