Shegay Maksim V, Švedas Vytas K, Voevodin Vladimir V, Suplatov Dmitry A, Popova Nina N
Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia.
Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia.
Bioinformatics. 2022 Jan 27;38(4):985-989. doi: 10.1093/bioinformatics/btab798.
With the increasing availability of 3D-data, the focus of comparative bioinformatic analysis is shifting from protein sequence alignments toward more content-rich 3D-alignments. This raises the need for new ways to improve the accuracy of 3D-superimposition.
We proposed guide tree optimization with genetic algorithm (GA) as a universal tool to improve the alignment quality of multiple protein 3D-structures systematically. As a proof of concept, we implemented the suggested GA-based approach in popular Matt and Caretta multiple protein 3D-structure alignment (M3DSA) algorithms, leading to a statistically significant improvement of the TM-score quality indicator by up to 220-1523% on 'SABmark Superfamilies' (in 49-77% of cases) and 'SABmark Twilight' (in 59-80% of cases) datasets. The observed improvement in collections of distant homologies highlights the potentials of GA to optimize 3D-alignments of diverse protein superfamilies as one plausible tool to study the structure-function relationship.
The source codes of patched gaCaretta and gaMatt programs are available open-access at https://github.com/n-canter/gamaps.
Supplementary data are available at Bioinformatics online.
随着3D数据可用性的不断提高,比较生物信息学分析的重点正从蛋白质序列比对转向内容更丰富的3D比对。这就需要新的方法来提高3D叠加的准确性。
我们提出了使用遗传算法(GA)进行引导树优化,作为一种通用工具来系统地提高多个蛋白质3D结构的比对质量。作为概念验证,我们在流行的Matt和Caretta多蛋白质3D结构比对(M3DSA)算法中实现了基于GA的建议方法,在“SABmark超家族”数据集(49 - 77%的情况)和“SABmark黄昏区”数据集(59 - 80%的情况)上,TM分数质量指标在统计上有显著提高,提高幅度高达220 - 1523%。在远缘同源物集合中观察到的改进突出了GA作为一种合理工具来优化不同蛋白质超家族的3D比对以研究结构 - 功能关系的潜力。
打补丁的gaCaretta和gaMatt程序的源代码可在https://github.com/n-canter/gamaps上开放获取。
补充数据可在《生物信息学》在线获取。