PPalign：具有直接耦合信息的 Potts 模型代表蛋白质的最佳对齐。

PPalign: optimal alignment of Potts models representing proteins with direct coupling information.

机构信息

Univ Rennes, Inria, CNRS, IRISA, Rennes, France.

出版信息

BMC Bioinformatics. 2021 Jun 10;22(1):317. doi: 10.1186/s12859-021-04222-4.

DOI:10.1186/s12859-021-04222-4

PMID:34112081

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8191105/

Abstract

BACKGROUND

To assign structural and functional annotations to the ever increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or the current state-of-the-art methods based on profile Hidden Markov Models, which rely on significant alignments of query sequences to annotated proteins or protein families. While powerful, these approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, we propose here to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition, and to compare proteins by aligning these models. Due to non-local dependencies, the problem of aligning Potts models is hard and remains the main computational bottleneck for their use.

METHODS

We introduce here an Integer Linear Programming formulation of the problem and PPalign, a program based on this formulation, to compute the optimal pairwise alignment of Potts models representing proteins in tractable time. The approach is assessed with respect to a non-redundant set of reference pairwise sequence alignments from SISYPHUS benchmark which have lowest sequence identity (between [Formula: see text] and [Formula: see text]) and enable to build reliable Potts models for each sequence to be aligned. This experimentation confirms that Potts models can be aligned in reasonable time ([Formula: see text] in average on these alignments). The contribution of couplings is evaluated in comparison with HHalign and independent-site PPalign. Although Potts models were not fully optimized for alignment purposes and simple gap scores were used, PPalign yields a better mean [Formula: see text] score and finds significantly better alignments than HHalign and PPalign without couplings in some cases.

CONCLUSIONS

These results show that pairwise couplings from protein Potts models can be used to improve the alignment of remotely related protein sequences in tractable time. Our experimentation suggests yet that new research on the inference of Potts models is now needed to make them more comparable and suitable for homology search. We think that PPalign's guaranteed optimality will be a powerful asset to perform unbiased investigations in this direction.

摘要

背景

为了将不断增加的测序蛋白质赋予结构和功能注释，主要方法依赖于基于序列的同源搜索方法，例如 BLAST 或当前基于轮廓隐马尔可夫模型的最先进方法，这些方法依赖于查询序列与注释蛋白质或蛋白质家族的显著对齐。虽然这些方法很强大，但它们没有考虑残基之间的共进化。利用接触预测领域的最新进展，我们在这里提出通过 Potts 模型来表示蛋白质，该模型除了位置组成外还可以对位置之间的直接耦合进行建模，并通过对齐这些模型来比较蛋白质。由于存在非局部依赖性，因此对齐 Potts 模型的问题很困难，并且仍然是其使用的主要计算瓶颈。

方法

我们在这里引入了问题的整数线性规划公式，并引入了 PPalign，这是一个基于该公式的程序，用于在可处理的时间内计算代表蛋白质的 Potts 模型的最优两两对齐。该方法通过 SISYPHUS 基准的非冗余参考序列比对集进行评估，该基准具有最低的序列同一性（在 [Formula: see text] 和 [Formula: see text] 之间），并且能够为要对齐的每个序列构建可靠的 Potts 模型。该实验证实 Potts 模型可以在合理的时间内进行对齐（在这些比对上平均为 [Formula: see text]）。与 HHalign 和独立位点 PPalign 相比，评估了耦合的贡献。尽管 Potts 模型没有针对对齐目的进行完全优化，并且使用了简单的空位得分，但 PPalign 在某些情况下产生了更好的平均 [Formula: see text] 得分，并找到了比 HHalign 和没有耦合的 PPalign 更好的对齐。

结论

这些结果表明，来自蛋白质 Potts 模型的成对耦合可以用于在可处理的时间内改进远程相关蛋白质序列的对齐。我们的实验还表明，现在需要对 Potts 模型的推断进行新的研究，以使它们更具可比性并适合同源搜索。我们认为 PPalign 的保证最优性将是进行这一方向的无偏研究的有力资产。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/403a/8191105/2d40cad129dd/12859_2021_4222_Fig1_HTML.jpg

相似文献

PPalign: optimal alignment of Potts models representing proteins with direct coupling information.

BMC Bioinformatics. 2021 Jun 10;22(1):317. doi: 10.1186/s12859-021-04222-4.

Computing posterior probabilities for score-based alignments using ppALIGN.

Stat Appl Genet Mol Biol. 2012 May 16;11(4):Article 1. doi: 10.1515/1544-6115.1702.

Alignment of protein sequences by their profiles.

Protein Sci. 2004 Apr;13(4):1071-87. doi: 10.1110/ps.03379804.

Remote homology search with hidden Potts models.

PLoS Comput Biol. 2020 Nov 30;16(11):e1008085. doi: 10.1371/journal.pcbi.1008085. eCollection 2020 Nov.

Large-scale comparison of protein sequence alignment algorithms with structure alignments.

Proteins. 2000 Jul 1;40(1):6-22. doi: 10.1002/(sici)1097-0134(20000701)40:1<6::aid-prot30>3.0.co;2-7.

Incremental window-based protein sequence alignment algorithms.

Bioinformatics. 2007 Jan 15;23(2):e17-23. doi: 10.1093/bioinformatics/btl297.

Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models.

PLoS Comput Biol. 2016 May 13;12(5):e1004889. doi: 10.1371/journal.pcbi.1004889. eCollection 2016 May.

Using CLUSTAL for multiple sequence alignments.

Methods Enzymol. 1996;266:383-402. doi: 10.1016/s0076-6879(96)66024-8.

Homology-based modeling of 3D structures of protein-protein complexes using alignments of modified sequence profiles.

Int J Biol Macromol. 2008 Aug 15;43(2):198-208. doi: 10.1016/j.ijbiomac.2008.05.004. Epub 2008 May 21.

Fast model-based protein homology detection without alignment.

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

引用本文的文献

Harnessing deep learning for proteome-scale detection of amyloid signaling motifs.

Bioinformatics. 2025 Jul 1;41(Supplement_1):i420-i428. doi: 10.1093/bioinformatics/btaf200.

DCAlign v1.0: aligning biological sequences using co-evolution models and informed priors.

Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad537.

Exploring a diverse world of effector domains and amyloid signaling motifs in fungal NLR proteins.

PLoS Comput Biol. 2022 Dec 21;18(12):e1010787. doi: 10.1371/journal.pcbi.1010787. eCollection 2022 Dec.

本文引用的文献

Aligning biological sequences by exploiting residue conservation and coevolution.

Phys Rev E. 2020 Dec;102(6-1):062409. doi: 10.1103/PhysRevE.102.062409.

Remote homology search with hidden Potts models.

PLoS Comput Biol. 2020 Nov 30;16(11):e1008085. doi: 10.1371/journal.pcbi.1008085. eCollection 2020 Nov.

HH-suite3 for fast remote homology detection and deep protein annotation.

BMC Bioinformatics. 2019 Sep 14;20(1):473. doi: 10.1186/s12859-019-3019-7.

How Pairwise Coevolutionary Models Capture the Collective Residue Variability in Proteins?

Mol Biol Evol. 2018 Apr 1;35(4):1018-1027. doi: 10.1093/molbev/msy007.

Uniclust databases of clustered and deeply annotated protein sequences and alignments.

Nucleic Acids Res. 2017 Jan 4;45(D1):D170-D176. doi: 10.1093/nar/gkw1081. Epub 2016 Nov 28.

ACE: adaptive cluster expansion for maximum entropy graphical model inference.

Bioinformatics. 2016 Oct 15;32(20):3089-3097. doi: 10.1093/bioinformatics/btw328. Epub 2016 Jun 21.

New encouraging developments in contact prediction: Assessment of the CASP11 results.

Proteins. 2016 Sep;84 Suppl 1(Suppl 1):131-44. doi: 10.1002/prot.24943. Epub 2015 Nov 17.

MRFy: Remote Homology Detection for Beta-Structural Proteins Using Markov Random Fields and Stochastic Search.

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):4-16. doi: 10.1109/TCBB.2014.2344682.

CCMpred--fast and precise prediction of protein residue-residue contacts from correlated mutations.

Bioinformatics. 2014 Nov 1;30(21):3128-30. doi: 10.1093/bioinformatics/btu500. Epub 2014 Jul 26.

MRFalign: protein homology detection through alignment of Markov random fields.

PLoS Comput Biol. 2014 Mar 27;10(3):e1003500. doi: 10.1371/journal.pcbi.1003500. eCollection 2014 Mar.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PPalign：具有直接耦合信息的 Potts 模型代表蛋白质的最佳对齐。

PPalign: optimal alignment of Potts models representing proteins with direct coupling information.

机构信息

Univ Rennes, Inria, CNRS, IRISA, Rennes, France.

出版信息

BMC Bioinformatics. 2021 Jun 10;22(1):317. doi: 10.1186/s12859-021-04222-4.

DOI:10.1186/s12859-021-04222-4

PMID:34112081

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8191105/

Abstract

BACKGROUND

METHODS

CONCLUSIONS

摘要

PPalign：具有直接耦合信息的 Potts 模型代表蛋白质的最佳对齐。

PPalign: optimal alignment of Potts models representing proteins with direct coupling information.

机构信息

出版信息

BACKGROUND

METHODS

CONCLUSIONS

背景

方法

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

PPalign：具有直接耦合信息的 Potts 模型代表蛋白质的最佳对齐。

PPalign: optimal alignment of Potts models representing proteins with direct coupling information.

机构信息

出版信息

BACKGROUND

METHODS

CONCLUSIONS

背景

方法

结论

相似文献

引用本文的文献

本文引用的文献