极短序列的多重比对。

The multiple alignments of very short sequences.

作者信息

Takács Kristóf, Grolmusz Vince

机构信息

PIT Bioinformatics Group Eötvös University Budapest Hungary.

Uratim Ltd Budapest Hungary.

出版信息

FASEB Bioadv. 2021 Apr 29;3(7):523-530. doi: 10.1096/fba.2020-00118. eCollection 2021 Jul.

DOI:10.1096/fba.2020-00118

PMID:34258521

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8255854/

Abstract

The multiple sequence alignment (MSA) is an increasingly important task in bioinformatics as we have to deal with the constantly increasing gene- and protein sequence databases. MSA is applied in phylogenetic analysis, in discovering conservative protein domains, in the assignment of secondary and tertiary structural features in proteins, or in the metagenomic sample analysis and gene discovery. Usually, the focus is on the MSA of long sequences, since in the practice these tasks appear most frequently. However, the strict analysis of the optimal MSA of short sequences is an area of negligence, and findings there may contribute to better and faster algorithms for the multiple alignment of long sequences. In the present contribution, we are examining length-1 sequences using arbitrary metric and length-2 sequences using unit metric, and we show that the optimum of the MSA problem can be achieved by the trivial alignment in both cases.

摘要

随着我们必须处理不断增长的基因和蛋白质序列数据库，多重序列比对（MSA）在生物信息学中变得越来越重要。MSA应用于系统发育分析、发现保守蛋白质结构域、确定蛋白质的二级和三级结构特征，或用于宏基因组样本分析和基因发现。通常，重点是长序列的MSA，因为在实践中这些任务最常出现。然而，对短序列最优MSA的严格分析是一个被忽视的领域，而在该领域的发现可能有助于开发出更好、更快的长序列多重比对算法。在本论文中，我们使用任意度量来研究长度为1的序列，并使用单位度量来研究长度为2的序列，我们证明在这两种情况下，MSA问题的最优解都可以通过平凡比对来实现。

相似文献

The multiple alignments of very short sequences.

FASEB Bioadv. 2021 Apr 29;3(7):523-530. doi: 10.1096/fba.2020-00118. eCollection 2021 Jul.

A survey on the algorithm and development of multiple sequence alignment.

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac069.

Protein multiple sequence alignment benchmarking through secondary structure prediction.

Bioinformatics. 2017 May 1;33(9):1331-1337. doi: 10.1093/bioinformatics/btw840.

An approach for COFFEE objective function to global DNA multiple sequence alignment.

Comput Biol Chem. 2018 Aug;75:39-44. doi: 10.1016/j.compbiolchem.2018.04.012. Epub 2018 Apr 25.

PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

Mol Phylogenet Evol. 2016 Sep;102:331-43. doi: 10.1016/j.ympev.2016.07.001. Epub 2016 Jul 1.

DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method.

Methods Mol Biol. 2018;1746:173-180. doi: 10.1007/978-1-4939-7683-6_13.

DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment.

BMC Bioinformatics. 2020 Jan 9;21(1):10. doi: 10.1186/s12859-019-3190-x.

Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns.

Bioinformatics. 2013 Sep 1;29(17):2112-21. doi: 10.1093/bioinformatics/btt360. Epub 2013 Jun 21.

MSACompro: improving multiple protein sequence alignment by predicted structural features.

Methods Mol Biol. 2014;1079:273-83. doi: 10.1007/978-1-62703-646-7_18.

A Novel Approach to Multiple Sequence Alignment Using Multiobjective Evolutionary Algorithm Based on Decomposition.

IEEE J Biomed Health Inform. 2016 Mar;20(2):717-27. doi: 10.1109/JBHI.2015.2403397. Epub 2015 Feb 12.

本文引用的文献

MetaHMM: A webserver for identifying novel genes with specified functions in metagenomic samples.

Genomics. 2019 Jul;111(4):883-885. doi: 10.1016/j.ygeno.2018.05.016. Epub 2018 May 23.

Clustal Omega for making accurate alignments of many protein sequences.

Protein Sci. 2018 Jan;27(1):135-145. doi: 10.1002/pro.3290. Epub 2017 Oct 30.

PASTASpark: multiple sequence alignment meets Big Data.

Bioinformatics. 2017 Sep 15;33(18):2948-2950. doi: 10.1093/bioinformatics/btx354.

The metagenomic telescope.

PLoS One. 2014 Jul 23;9(7):e101605. doi: 10.1371/journal.pone.0101605. eCollection 2014.

TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.

Mol Biol Evol. 2014 Jun;31(6):1625-37. doi: 10.1093/molbev/msu117. Epub 2014 Apr 1.

MSACompro: improving multiple protein sequence alignment by predicted structural features.

Methods Mol Biol. 2014;1079:273-83. doi: 10.1007/978-1-62703-646-7_18.

PRALINE: a versatile multiple sequence alignment toolkit.

Methods Mol Biol. 2014;1079:245-62. doi: 10.1007/978-1-62703-646-7_16.

Clustal Omega, accurate alignment of very large numbers of sequences.

Methods Mol Biol. 2014;1079:105-16. doi: 10.1007/978-1-62703-646-7_6.

Accelerated Profile HMM Searches.

PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20.

SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W29-34. doi: 10.1093/nar/gkq298. Epub 2010 Apr 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

极短序列的多重比对。

The multiple alignments of very short sequences.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献