Suppr超能文献

利用遗传算法优化引导树以改进多种蛋白质三维结构比对。

Guide tree optimization with genetic algorithm to improve multiple protein 3D-structure alignment.

作者信息

Shegay Maksim V, Švedas Vytas K, Voevodin Vladimir V, Suplatov Dmitry A, Popova Nina N

机构信息

Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia.

Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia.

出版信息

Bioinformatics. 2022 Jan 27;38(4):985-989. doi: 10.1093/bioinformatics/btab798.

Abstract

MOTIVATION

With the increasing availability of 3D-data, the focus of comparative bioinformatic analysis is shifting from protein sequence alignments toward more content-rich 3D-alignments. This raises the need for new ways to improve the accuracy of 3D-superimposition.

RESULTS

We proposed guide tree optimization with genetic algorithm (GA) as a universal tool to improve the alignment quality of multiple protein 3D-structures systematically. As a proof of concept, we implemented the suggested GA-based approach in popular Matt and Caretta multiple protein 3D-structure alignment (M3DSA) algorithms, leading to a statistically significant improvement of the TM-score quality indicator by up to 220-1523% on 'SABmark Superfamilies' (in 49-77% of cases) and 'SABmark Twilight' (in 59-80% of cases) datasets. The observed improvement in collections of distant homologies highlights the potentials of GA to optimize 3D-alignments of diverse protein superfamilies as one plausible tool to study the structure-function relationship.

AVAILABILITY AND IMPLEMENTATION

The source codes of patched gaCaretta and gaMatt programs are available open-access at https://github.com/n-canter/gamaps.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

随着3D数据可用性的不断提高,比较生物信息学分析的重点正从蛋白质序列比对转向内容更丰富的3D比对。这就需要新的方法来提高3D叠加的准确性。

结果

我们提出了使用遗传算法(GA)进行引导树优化,作为一种通用工具来系统地提高多个蛋白质3D结构的比对质量。作为概念验证,我们在流行的Matt和Caretta多蛋白质3D结构比对(M3DSA)算法中实现了基于GA的建议方法,在“SABmark超家族”数据集(49 - 77%的情况)和“SABmark黄昏区”数据集(59 - 80%的情况)上,TM分数质量指标在统计上有显著提高,提高幅度高达220 - 1523%。在远缘同源物集合中观察到的改进突出了GA作为一种合理工具来优化不同蛋白质超家族的3D比对以研究结构 - 功能关系的潜力。

可用性和实现方式

打补丁的gaCaretta和gaMatt程序的源代码可在https://github.com/n-canter/gamaps上开放获取。

补充信息

补充数据可在《生物信息学》在线获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验