Takács Kristóf, Grolmusz Vince
PIT Bioinformatics Group Eötvös University Budapest Hungary.
Uratim Ltd Budapest Hungary.
FASEB Bioadv. 2021 Apr 29;3(7):523-530. doi: 10.1096/fba.2020-00118. eCollection 2021 Jul.
The multiple sequence alignment (MSA) is an increasingly important task in bioinformatics as we have to deal with the constantly increasing gene- and protein sequence databases. MSA is applied in phylogenetic analysis, in discovering conservative protein domains, in the assignment of secondary and tertiary structural features in proteins, or in the metagenomic sample analysis and gene discovery. Usually, the focus is on the MSA of long sequences, since in the practice these tasks appear most frequently. However, the strict analysis of the optimal MSA of short sequences is an area of negligence, and findings there may contribute to better and faster algorithms for the multiple alignment of long sequences. In the present contribution, we are examining length-1 sequences using arbitrary metric and length-2 sequences using unit metric, and we show that the optimum of the MSA problem can be achieved by the trivial alignment in both cases.
随着我们必须处理不断增长的基因和蛋白质序列数据库,多重序列比对(MSA)在生物信息学中变得越来越重要。MSA应用于系统发育分析、发现保守蛋白质结构域、确定蛋白质的二级和三级结构特征,或用于宏基因组样本分析和基因发现。通常,重点是长序列的MSA,因为在实践中这些任务最常出现。然而,对短序列最优MSA的严格分析是一个被忽视的领域,而在该领域的发现可能有助于开发出更好、更快的长序列多重比对算法。在本论文中,我们使用任意度量来研究长度为1的序列,并使用单位度量来研究长度为2的序列,我们证明在这两种情况下,MSA问题的最优解都可以通过平凡比对来实现。