Sigorskikh Andrey I, Latortseva Daria D, Karyagina Anna S, Spirin Sergey A
Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, 119992, Russia.
Gamaleya National Research Center of Epidemiology and Microbiology, Ministry of Healthcare of the Russian Federation, Moscow, 123098, Russia.
Biochemistry (Mosc). 2022 Dec;87(12):1689-1698. doi: 10.1134/S0006297922120239.
e-mail: sas@belozersky.msu.ru Protein phylogeny is usually reconstructed basing on a multiple alignment of amino acid sequences. One of the problems of such alignments is the presence of regions with different degree of conservation, including those with a questionable quality of the alignment. This problem is often solved by filtering the alignment columns with a special software developed for this purpose. In this work, we investigated various approaches to the phylogeny reconstruction using proteins with two evolutionary domains as examples. The sequences of such proteins are inherently heterogeneous in the degree of conservation due to the presence of both evolutionary domains and linkers between them, as well as the N- and C-termini. It is shown that filtering the alignment columns on average improves the quality of reconstruction only when using the full-length sequences and only for eukaryotic proteins. Limiting the alignment to the evolutionary domains with rejection of less conserved linkers and terminal sequences on average worsened the quality of phylogenetic reconstruction.
sas@belozersky.msu.ru 蛋白质系统发育通常基于氨基酸序列的多序列比对来重建。此类比对的问题之一是存在不同保守程度的区域,包括那些比对质量存疑的区域。这个问题通常通过使用为此目的开发的特殊软件过滤比对列来解决。在这项工作中,我们以具有两个进化结构域的蛋白质为例,研究了系统发育重建的各种方法。由于存在进化结构域及其之间的连接子以及N端和C端,此类蛋白质的序列在保守程度上本质上是异质的。结果表明,仅在使用全长序列且仅针对真核蛋白质时,平均而言过滤比对列会提高重建质量。将比对限制在进化结构域,舍弃保守性较低的连接子和末端序列,平均而言会降低系统发育重建的质量。