transAlign：利用氨基酸促进蛋白质编码DNA序列的多重比对。

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.

作者信息

Bininda-Emonds Olaf R P

机构信息

Lehrstuhl für Tierzucht, Technical University of Munich, Hochfeldweg 1, 85354 Freising-Weihenstephan, Germany.

出版信息

BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.

DOI:10.1186/1471-2105-6-156

PMID:15969769

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1175081/

Abstract

BACKGROUND

Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis. However, multiple alignment represents a computationally difficult problem. For protein-coding DNA sequences, it is more advantageous in terms of both speed and accuracy to align the amino-acid sequences specified by the DNA sequences rather than the DNA sequences themselves. Many implementations making use of this concept of "translated alignments" are incomplete in the sense that they require the user to manually translate the DNA sequences and to perform the amino-acid alignment. As such, they are not well suited to large-scale automated alignments of large and/or numerous DNA data sets.

RESULTS

transAlign is an open-source Perl script that aligns protein-coding DNA sequences via their amino-acid translations to take advantage of the superior multiple-alignment capabilities and speed of an amino-acid alignment. It operates by translating each DNA sequence into its corresponding amino-acid sequence, passing the entire matrix to ClustalW for alignment, and then back-translating the resulting amino-acid alignment to derive the aligned DNA sequences. In the translation step, transAlign determines the optimal orientation and reading frame for each DNA sequence according to the desired genetic code. It also checks for apparent frame shifts in the DNA sequences and can handle frame-shifted sequences in one of three ways (delete, align as amino acids regardless, or profile align as DNA). As a set of comparative benchmarks derived from six protein-coding genes for mammals shows, the strategy implemented in transAlign always improves the speed and usually the apparent accuracy of the alignment of protein-coding DNA sequences.

CONCLUSION

transAlign represents one of few full and cross-platform implementations of the concept of translated alignments. Both the advantages accruing from performing a translated alignment and the suite of user-definable options available in the program mean that transAlign is ideally suited for large-scale automated alignments of very large and/or very numerous protein-coding DNA data sets. However, the good performance offered by the program also translates to the alignment of any set of protein-coding sequences. transAlign, including the source code, is freely available at http://www.tierzucht.tum.de/Bininda-Emonds/ (under "Programs").

摘要

背景

同源DNA序列的比对对于比较基因组学和系统发育分析至关重要。然而，多重比对是一个计算难题。对于蛋白质编码DNA序列，比对由DNA序列指定的氨基酸序列在速度和准确性方面比比对DNA序列本身更具优势。许多利用“翻译比对”这一概念的实现并不完整，因为它们要求用户手动翻译DNA序列并进行氨基酸比对。因此，它们不太适合对大型和/或众多DNA数据集进行大规模自动比对。

结果

transAlign是一个开源的Perl脚本，它通过氨基酸翻译比对蛋白质编码DNA序列，以利用氨基酸比对出色的多重比对能力和速度。它的操作方式是将每个DNA序列翻译成相应的氨基酸序列，将整个矩阵传递给ClustalW进行比对，然后将得到的氨基酸比对结果反向翻译以获得比对后的DNA序列。在翻译步骤中，transAlign根据所需的遗传密码确定每个DNA序列的最佳方向和阅读框。它还会检查DNA序列中明显的移码情况，并可以通过三种方式之一处理移码序列（删除、不管怎样都作为氨基酸比对，或作为DNA进行轮廓比对）。从六个哺乳动物蛋白质编码基因得出的一组比较基准表明，transAlign中实施的策略总能提高蛋白质编码DNA序列比对的速度，通常还能提高明显的准确性。

结论

transAlign是翻译比对概念的少数完整且跨平台的实现之一。进行翻译比对带来的优势以及程序中可用的一系列用户可定义选项意味着transAlign非常适合对非常大的和/或非常多的蛋白质编码DNA数据集进行大规模自动比对。然而，该程序提供的良好性能也适用于任何一组蛋白质编码序列的比对。transAlign包括源代码，可在http://www.tierzucht.tum.de/Bininda-Emonds/（“程序”下）免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c57/1175081/d364f0891173/1471-2105-6-156-1.jpg

相似文献

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign：利用氨基酸促进蛋白质编码DNA序列的多重比对。

BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

GATA: a graphic alignment tool for comparative sequence analysis.GATA：一种用于比较序列分析的图形比对工具。

BMC Bioinformatics. 2005 Jan 17;6:9. doi: 10.1186/1471-2105-6-9.

Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment.序列渐进比对，一种用于实际大规模概率一致性比对的框架。

Bioinformatics. 2009 Feb 1;25(3):295-301. doi: 10.1093/bioinformatics/btn630. Epub 2008 Dec 4.

ABC: software for interactive browsing of genomic multiple sequence alignment data.ABC：用于交互式浏览基因组多序列比对数据的软件。

BMC Bioinformatics. 2004 Dec 8;5:192. doi: 10.1186/1471-2105-5-192.

Sigma: multiple alignment of weakly-conserved non-coding DNA sequence.西格玛：弱保守非编码DNA序列的多重比对

BMC Bioinformatics. 2006 Mar 16;7:143. doi: 10.1186/1471-2105-7-143.

Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein.基于蛋白质的氨基酸和二肽组成对基因表达水平进行相关性分析与预测。

BMC Bioinformatics. 2005 Mar 17;6:59. doi: 10.1186/1471-2105-6-59.

Direct mapping and alignment of protein sequences onto genomic sequence.蛋白质序列到基因组序列的直接映射与比对。

Bioinformatics. 2008 Nov 1;24(21):2438-44. doi: 10.1093/bioinformatics/btn460. Epub 2008 Aug 26.

A local multiple alignment method for detection of non-coding RNA sequences.一种用于检测非编码RNA序列的局部多重比对方法。

Bioinformatics. 2009 Jun 15;25(12):1498-505. doi: 10.1093/bioinformatics/btp261. Epub 2009 Apr 17.

引用本文的文献

Elucidating the evolutionary dynamics of parasitism in Cuscuta: in-depth phylogenetic reconstruction and extensive plastomes reduction.阐明菟丝子寄生的进化动态：深入的系统发育重建和广泛的质体基因组缩减

BMC Genomics. 2025 Feb 12;26(1):137. doi: 10.1186/s12864-025-11324-3.

The review of the genus (Coleoptera, Coccinellidae) from Pakistan.对来自巴基斯坦的**属**（鞘翅目，瓢虫科）的综述。

Biodivers Data J. 2024 Nov 21;12:e137417. doi: 10.3897/BDJ.12.e137417. eCollection 2024.

Transcriptomic dynamics of ABA response in Brassica napus guard cells.甘蓝型油菜保卫细胞中脱落酸反应的转录组动力学

Stress Biol. 2024 Oct 14;4(1):43. doi: 10.1007/s44154-024-00169-7.

Sensitive and error-tolerant annotation of protein-coding DNA with BATH.利用BATH对蛋白质编码DNA进行灵敏且容错的注释。

Bioinform Adv. 2024 Jun 14;4(1):vbae088. doi: 10.1093/bioadv/vbae088. eCollection 2024.

COATi: Statistical Pairwise Alignment of Protein-Coding Sequences.COATi：蛋白质编码序列的统计成对比对。

Mol Biol Evol. 2024 Jul 3;41(7). doi: 10.1093/molbev/msae117.

OrthoPhyl-streamlining large-scale, orthology-based phylogenomic studies of bacteria at broad evolutionary scales.OrthoPhyl——简化基于同源性的大规模细菌系统发育基因组学研究，以广泛的进化尺度为目标。

G3 (Bethesda). 2024 Aug 7;14(8). doi: 10.1093/g3journal/jkae119.

CNCA aligns small annotated genomes.中加对齐小型注释基因组。

BMC Bioinformatics. 2024 Feb 29;25(1):89. doi: 10.1186/s12859-024-05700-1.

Sensitive and error-tolerant annotation of protein-coding DNA with BATH.使用BATH对蛋白质编码DNA进行灵敏且容错的注释。

bioRxiv. 2024 Jan 1:2023.12.31.573773. doi: 10.1101/2023.12.31.573773.

Characterization of Four Complete Mitogenomes of Species and Their Related Phylogenetic Implications.四个物种完整线粒体基因组的特征及其相关系统发育意义

Insects. 2024 Jan 11;15(1):50. doi: 10.3390/insects15010050.

Cencurut virus: A novel from Asian house shrews () in Singapore.森库鲁病毒：一种来自新加坡亚洲家鼩（）的新型病毒。

One Health. 2023 Mar 29;16:100529. doi: 10.1016/j.onehlt.2023.100529. eCollection 2023 Jun.

本文引用的文献

Comparative genomics: methods and applications.比较基因组学：方法与应用

Naturwissenschaften. 2004 Sep;91(9):405-21. doi: 10.1007/s00114-004-0542-8. Epub 2004 Jun 25.

A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.一种通过最大似然法估计大型系统发育树的简单、快速且准确的算法。

Syst Biol. 2003 Oct;52(5):696-704. doi: 10.1080/10635150390235520.

RevTrans: Multiple alignment of coding DNA from aligned amino acid sequences.RevTrans：从比对的氨基酸序列进行编码DNA的多序列比对。

Nucleic Acids Res. 2003 Jul 1;31(13):3537-9. doi: 10.1093/nar/gkg609.

Multiple sequence alignment with the Clustal series of programs.使用Clustal系列程序进行多序列比对。

Nucleic Acids Res. 2003 Jul 1;31(13):3497-500. doi: 10.1093/nar/gkg500.

LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.LAGAN和多LAGAN：用于基因组DNA大规模多重比对的高效工具。

Genome Res. 2003 Apr;13(4):721-31. doi: 10.1101/gr.926603. Epub 2003 Mar 12.

Selecting the best-fit model of nucleotide substitution.选择最佳拟合的核苷酸替换模型。

Syst Biol. 2001 Aug;50(4):580-601.

NEXUS: an extensible file format for systematic information.NEXUS：一种用于系统信息的可扩展文件格式。

Syst Biol. 1997 Dec;46(4):590-621. doi: 10.1093/sysbio/46.4.590.

Genew: the human gene nomenclature database.Genew：人类基因命名数据库。

Nucleic Acids Res. 2002 Jan 1;30(1):169-71. doi: 10.1093/nar/30.1.169.

DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment.DIALIGN 2：多序列比对中片段对片段方法的改进。

Bioinformatics. 1999 Mar;15(3):211-8. doi: 10.1093/bioinformatics/15.3.211.

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.CLUSTAL W：通过序列加权、位置特异性空位罚分和权重矩阵选择提高渐进多序列比对的灵敏度。

Nucleic Acids Res. 1994 Nov 11;22(22):4673-80. doi: 10.1093/nar/22.22.4673.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

transAlign：利用氨基酸促进蛋白质编码DNA序列的多重比对。

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献