部分编码核酸序列的多序列比对

Multiple sequence alignments of partially coding nucleic acid sequences.

作者信息

Stocsits Roman R, Hofacker Ivo L, Fried Claudia, Stadler Peter F

机构信息

Interdisciplinary Centre for Bioinformatics, University of Leipzig, Haertelstrasse 16-18, D-04107 Leipzig, Germany.

出版信息

BMC Bioinformatics. 2005 Jun 28;6:160. doi: 10.1186/1471-2105-6-160.

DOI:10.1186/1471-2105-6-160

PMID:15985156

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1182351/

Abstract

BACKGROUND

High quality sequence alignments of RNA and DNA sequences are an important prerequisite for the comparative analysis of genomic sequence data. Nucleic acid sequences, however, exhibit a much larger sequence heterogeneity compared to their encoded protein sequences due to the redundancy of the genetic code. It is desirable, therefore, to make use of the amino acid sequence when aligning coding nucleic acid sequences. In many cases, however, only a part of the sequence of interest is translated. On the other hand, overlapping reading frames may encode multiple alternative proteins, possibly with intermittent non-coding parts. Examples are, in particular, RNA virus genomes.

RESULTS

The standard scoring scheme for nucleic acid alignments can be extended to incorporate simultaneously information on translation products in one or more reading frames. Here we present a multiple alignment tool, codaln, that implements a combined nucleic acid plus amino acid scoring model for pairwise and progressive multiple alignments that allows arbitrary weighting for almost all scoring parameters. Resource requirements of codaln are comparable with those of standard tools such as ClustalW.

CONCLUSION

We demonstrate the applicability of codaln to various biologically relevant types of sequences (bacteriophage Levivirus and Vertebrate Hox clusters) and show that the combination of nucleic acid and amino acid sequence information leads to improved alignments. These, in turn, increase the performance of analysis tools that depend strictly on good input alignments such as methods for detecting conserved RNA secondary structure elements.

摘要

背景

RNA和DNA序列的高质量序列比对是基因组序列数据比较分析的重要前提。然而，由于遗传密码的冗余性，核酸序列与其编码的蛋白质序列相比表现出更大的序列异质性。因此，在比对编码核酸序列时利用氨基酸序列是很有必要的。然而，在许多情况下，只有感兴趣序列的一部分被翻译。另一方面，重叠阅读框可能编码多种替代蛋白质，可能带有间歇性的非编码部分。特别是RNA病毒基因组就是这样的例子。

结果

核酸比对的标准评分方案可以扩展，以便同时纳入一个或多个阅读框中翻译产物的信息。在此，我们展示了一种多重比对工具codaln，它为成对比对和渐进式多重比对实现了一种核酸加氨基酸的组合评分模型，该模型允许对几乎所有评分参数进行任意加权。codaln的资源需求与标准工具（如ClustalW）相当。

结论

我们证明了codaln适用于各种生物学相关类型的序列（噬菌体细小病毒属和脊椎动物Hox簇），并表明核酸和氨基酸序列信息的结合能带来更好的比对。反过来，这又提高了那些严格依赖良好输入比对的分析工具的性能，比如用于检测保守RNA二级结构元件的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24a8/1182351/314840e068ac/1471-2105-6-160-1.jpg

相似文献

Multiple sequence alignments of partially coding nucleic acid sequences.

BMC Bioinformatics. 2005 Jun 28;6:160. doi: 10.1186/1471-2105-6-160.

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.

BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.

Considerations in the identification of functional RNA structural elements in genomic alignments.

BMC Bioinformatics. 2007 Jan 30;8:33. doi: 10.1186/1471-2105-8-33.

CGAT: a comparative genome analysis tool for visualizing alignments in the analysis of complex evolutionary changes between closely related genomes.

BMC Bioinformatics. 2006 Oct 24;7:472. doi: 10.1186/1471-2105-7-472.

An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.

J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.

Sigma: multiple alignment of weakly-conserved non-coding DNA sequence.

BMC Bioinformatics. 2006 Mar 16;7:143. doi: 10.1186/1471-2105-7-143.

GATA: a graphic alignment tool for comparative sequence analysis.

BMC Bioinformatics. 2005 Jan 17;6:9. doi: 10.1186/1471-2105-6-9.

Alignment of RNA base pairing probability matrices.

Bioinformatics. 2004 Sep 22;20(14):2222-7. doi: 10.1093/bioinformatics/bth229. Epub 2004 Apr 8.

Detecting overlapping coding sequences with pairwise alignments.

Bioinformatics. 2005 Feb 1;21(3):282-92. doi: 10.1093/bioinformatics/bti007. Epub 2004 Sep 3.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

引用本文的文献

An intra-family conserved high-order RNA structure within the M ORF is important for arterivirus subgenomic RNA accumulation and infectious virus production.

J Virol. 2025 May 20;99(5):e0216724. doi: 10.1128/jvi.02167-24. Epub 2025 Apr 7.

A novel ilarvirus protein CP-RT is expressed via stop codon readthrough and suppresses RDR6-dependent RNA silencing.

PLoS Pathog. 2024 May 30;20(5):e1012034. doi: 10.1371/journal.ppat.1012034. eCollection 2024 May.

A putative new SARS-CoV protein, 3c, encoded in an ORF overlapping ORF3a.

J Gen Virol. 2020 Oct;101(10):1085-1089. doi: 10.1099/jgv.0.001469. Epub 2020 Jul 13.

An upstream protein-coding region in enteroviruses modulates virus infection in gut epithelial cells.

Nat Microbiol. 2019 Feb;4(2):280-292. doi: 10.1038/s41564-018-0297-1. Epub 2018 Nov 26.

Transcriptional and Translational Landscape of Equine Torovirus.

J Virol. 2018 Aug 16;92(17). doi: 10.1128/JVI.00589-18. Print 2018 Sep 1.

Mapping overlapping functional elements embedded within the protein-coding regions of RNA viruses.

Nucleic Acids Res. 2014 Nov 10;42(20):12425-39. doi: 10.1093/nar/gku981. Epub 2014 Oct 17.

Evolutionary liberties of the Abutilon mosaic virus cluster.

Virus Genes. 2015 Feb;50(1):63-70. doi: 10.1007/s11262-014-1125-1. Epub 2014 Oct 15.

Unique RNA 2 sequences of two Brazilian isolates of Pepper ringspot virus, a tobravirus.

Virus Genes. 2014 Aug;49(1):169-73. doi: 10.1007/s11262-014-1066-8. Epub 2014 Apr 23.

MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.

PLoS One. 2011;6(9):e22594. doi: 10.1371/journal.pone.0022594. Epub 2011 Sep 16.

MAGNOLIA: multiple alignment of protein-coding and structural RNA sequences.

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W14-8. doi: 10.1093/nar/gkn321. Epub 2008 May 30.

本文引用的文献

Fast and reliable prediction of noncoding RNAs.

Proc Natl Acad Sci U S A. 2005 Feb 15;102(7):2454-9. doi: 10.1073/pnas.0409169102. Epub 2005 Jan 21.

Properties of overlapping genes are conserved across microbial genomes.

Genome Res. 2004 Nov;14(11):2268-72. doi: 10.1101/gr.2433104.

A comparative method for finding and folding RNA secondary structures within protein-coding regions.

Nucleic Acids Res. 2004 Sep 24;32(16):4925-36. doi: 10.1093/nar/gkh839. Print 2004.

Conserved RNA secondary structures in viral genomes: a survey.

Bioinformatics. 2004 Jul 10;20(10):1495-9. doi: 10.1093/bioinformatics/bth108.

Conserved RNA secondary structures in Flaviviridae genomes.

J Gen Virol. 2004 May;85(Pt 5):1113-1124. doi: 10.1099/vir.0.19462-0.

Gene fusion and overlapping reading frames in the mammalian genes for 4E-BP3 and MASK.

J Biol Chem. 2003 Dec 26;278(52):52290-7. doi: 10.1074/jbc.M310761200. Epub 2003 Oct 13.

Molecular biology of umbraviruses: phantom warriors.

J Gen Virol. 2003 Aug;84(Pt 8):1951-1960. doi: 10.1099/vir.0.19219-0.

Widespread occurrence of antisense transcription in the human genome.

Nat Biotechnol. 2003 Apr;21(4):379-86. doi: 10.1038/nbt808. Epub 2003 Mar 17.

The virological and clinical significance of mutations in the overlapping envelope and polymerase genes of hepatitis B virus.

J Clin Virol. 2002 Aug;25(2):97-106. doi: 10.1016/s1386-6532(02)00049-5.

Computational discovery of sense-antisense transcription in the human and mouse genomes.

Genome Biol. 2002 Aug 22;3(9):RESEARCH0044. doi: 10.1186/gb-2002-3-9-research0044.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

部分编码核酸序列的多序列比对

Multiple sequence alignments of partially coding nucleic acid sequences.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献