Suppr超能文献

MARS:使用优化序列改进多重环状序列比对

MARS: improving multiple circular sequence alignment using refined sequences.

作者信息

Ayad Lorraine A K, Pissis Solon P

机构信息

Department of Informatics, King's College London, Strand, London, WC2R 2LS, UK.

出版信息

BMC Genomics. 2017 Jan 14;18(1):86. doi: 10.1186/s12864-016-3477-5.

Abstract

BACKGROUND

A fundamental assumption of all widely-used multiple sequence alignment techniques is that the left- and right-most positions of the input sequences are relevant to the alignment. However, the position where a sequence starts or ends can be totally arbitrary due to a number of reasons: arbitrariness in the linearisation (sequencing) of a circular molecular structure; or inconsistencies introduced into sequence databases due to different linearisation standards. These scenarios are relevant, for instance, in the process of multiple sequence alignment of mitochondrial DNA, viroid, viral or other genomes, which have a circular molecular structure. A solution for these inconsistencies would be to identify a suitable rotation (cyclic shift) for each sequence; these refined sequences may in turn lead to improved multiple sequence alignments using the preferred multiple sequence alignment program.

RESULTS

We present MARS, a new heuristic method for improving Multiple circular sequence Alignment using Refined Sequences. MARS was implemented in the C++ programming language as a program to compute the rotations (cyclic shifts) required to best align a set of input sequences. Experimental results, using real and synthetic data, show that MARS improves the alignments, with respect to standard genetic measures and the inferred maximum-likelihood-based phylogenies, and outperforms state-of-the-art methods both in terms of accuracy and efficiency. Our results show, among others, that the average pairwise distance in the multiple sequence alignment of a dataset of widely-studied mitochondrial DNA sequences is reduced by around 5% when MARS is applied before a multiple sequence alignment is performed.

CONCLUSIONS

Analysing multiple sequences simultaneously is fundamental in biological research and multiple sequence alignment has been found to be a popular method for this task. Conventional alignment techniques cannot be used effectively when the position where sequences start is arbitrary. We present here a method, which can be used in conjunction with any multiple sequence alignment program, to address this problem effectively and efficiently.

摘要

背景

所有广泛使用的多序列比对技术的一个基本假设是,输入序列的最左端和最右端位置与比对相关。然而,由于多种原因,序列开始或结束的位置可能完全是任意的:环状分子结构线性化(测序)中的任意性;或由于不同的线性化标准而引入序列数据库中的不一致性。例如,在具有环状分子结构的线粒体DNA、类病毒、病毒或其他基因组的多序列比对过程中,这些情况是相关的。解决这些不一致性的一个方法是为每个序列确定一个合适的旋转(循环移位);这些经过优化的序列反过来可能会使用首选的多序列比对程序改进多序列比对。

结果

我们提出了MARS,一种使用优化序列改进多环状序列比对的新启发式方法。MARS用C++编程语言实现,作为一个程序来计算最佳比对一组输入序列所需的旋转(循环移位)。使用真实和合成数据的实验结果表明,MARS在标准遗传指标和基于推断的最大似然系统发育方面改进了比对,并且在准确性和效率方面都优于现有方法。我们的结果表明,除其他外,在进行多序列比对之前应用MARS时,一组广泛研究的线粒体DNA序列数据集的多序列比对中的平均成对距离减少了约5%。

结论

同时分析多个序列是生物学研究的基础,多序列比对已被发现是完成这项任务的一种常用方法。当序列开始的位置是任意的时候,传统的比对技术不能有效地使用。我们在此提出一种方法,它可以与任何多序列比对程序结合使用,以有效且高效地解决这个问题。

相似文献

1
MARS: improving multiple circular sequence alignment using refined sequences.
BMC Genomics. 2017 Jan 14;18(1):86. doi: 10.1186/s12864-016-3477-5.
2
CSA: an efficient algorithm to improve circular DNA multiple alignment.
BMC Bioinformatics. 2009 Jul 23;10:230. doi: 10.1186/1471-2105-10-230.
3
GATA: a graphic alignment tool for comparative sequence analysis.
BMC Bioinformatics. 2005 Jan 17;6:9. doi: 10.1186/1471-2105-6-9.
4
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
5
Alignment of protein sequences by their profiles.
Protein Sci. 2004 Apr;13(4):1071-87. doi: 10.1110/ps.03379804.
6
ReformAlign: improved multiple sequence alignments using a profile-based meta-alignment approach.
BMC Bioinformatics. 2014 Aug 7;15(1):265. doi: 10.1186/1471-2105-15-265.
7
OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.
BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47.
8
Ancestral sequence alignment under optimal conditions.
BMC Bioinformatics. 2005 Nov 17;6:273. doi: 10.1186/1471-2105-6-273.
10
SeqTools: visual tools for manual analysis of sequence alignments.
BMC Res Notes. 2016 Jan 22;9:39. doi: 10.1186/s13104-016-1847-3.

引用本文的文献

1
Prediction of Circular RNA Secondary Structures and Their Targets.
Adv Exp Med Biol. 2025;1485:59-74. doi: 10.1007/978-981-96-9428-0_5.
2
Genomic Architecture of the Clownfish Hybrid Amphiprion leucokranos.
Genome Biol Evol. 2025 Mar 6;17(3). doi: 10.1093/gbe/evaf031.
3
A Post-Mortem Molecular Damage Profile in the Ancient Human Mitochondrial DNA.
Mol Ecol Resour. 2025 May;25(4):e14061. doi: 10.1111/1755-0998.14061. Epub 2025 Jan 7.
4
Rapid speciation in the holopelagic ctenophore following glacial recession.
bioRxiv. 2024 Nov 9:2024.10.10.617593. doi: 10.1101/2024.10.10.617593.
5
An easy-to-use pipeline to analyze amplicon-based Next Generation Sequencing results of human mitochondrial DNA from degraded samples.
PLoS One. 2024 Nov 21;19(11):e0311115. doi: 10.1371/journal.pone.0311115. eCollection 2024.
6
Viroid-like colonists of human microbiomes.
Cell. 2024 Nov 14;187(23):6521-6536.e18. doi: 10.1016/j.cell.2024.09.033. Epub 2024 Oct 30.
7
DNA barcodes are ineffective for species identification of corals from the aquarium trade.
Biodivers Data J. 2024 Jul 17;12:e125914. doi: 10.3897/BDJ.12.e125914. eCollection 2024.
8
Full-genome sequencing of dozens of new DNA viruses found in Spanish bat feces.
Microbiol Spectr. 2024 Aug 6;12(8):e0067524. doi: 10.1128/spectrum.00675-24. Epub 2024 Jul 11.
10
CircSeqAlignTk: An R package for end-to-end analysis of RNA-seq data for circular genomes.
F1000Res. 2024 Apr 30;11:1221. doi: 10.12688/f1000research.127348.1. eCollection 2022.

本文引用的文献

1
Circular sequence comparison: algorithms and applications.
Algorithms Mol Biol. 2016 May 10;11:12. doi: 10.1186/s13015-016-0076-6. eCollection 2016.
2
Multiple sequence alignment modeling: methods and applications.
Brief Bioinform. 2016 Nov;17(6):1009-1023. doi: 10.1093/bib/bbv099. Epub 2015 Nov 27.
5
Fast algorithms for approximate circular string matching.
Algorithms Mol Biol. 2014 Mar 22;9(1):9. doi: 10.1186/1748-7188-9-9.
6
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.
Bioinformatics. 2014 May 1;30(9):1312-3. doi: 10.1093/bioinformatics/btu033. Epub 2014 Jan 21.
7
Thematic minireview series on circular proteins.
J Biol Chem. 2012 Aug 3;287(32):26999-7000. doi: 10.1074/jbc.R112.390344. Epub 2012 Jun 14.
8
Accounting for alignment uncertainty in phylogenomics.
PLoS One. 2012;7(1):e30288. doi: 10.1371/journal.pone.0030288. Epub 2012 Jan 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验