Suppr超能文献

研究能够实现蛋白质伴侣算法预测的协同进化信号的统计条件。

Investigating Statistical Conditions of Coevolutionary Signals that Enable Algorithmic Predictions of Protein Partners.

作者信息

Fiorote José, Alves João, Stock Letícia, Treptow Werner

机构信息

Laboratório de Biologia Teórica e Computacional (LBTC), Universidade de Brasília, Brasilia, DF 70910-900, Brasil.

Ben May Department for Cancer Research, University of Chicago, Chicago, Illinois 60637, United States.

出版信息

J Chem Inf Model. 2025 Apr 28;65(8):4107-4115. doi: 10.1021/acs.jcim.5c00052. Epub 2025 Apr 15.

Abstract

This study examines the statistical conditions of coevolutionary signals that allow algorithmic predictions of protein partners based on amino acid sequences rather than 3D structures. It introduces a Markov stochastic model that predicts the number of correct protein partners based on coevolutionary information. The model defines state probabilities using a Poisson mixture of normal distributions, with key parameters including the total number of protein sequences , the coevolutionary information gap α, and variance σ. The model suggests that algorithmic approaches that maximize coevolutionary information cannot effectively resolve partners in protein families with a large number of sequences ≥ 100. The model shows that true-positive (TP) rates can be enhanced by disregarding mismatches among similar sequences. This approach allows a distinction, in terms of {α, σ}, between optimized solutions with trivial errors and other degenerate solutions. Our findings enable the a priori classification of protein families where partners can be reliably predicted by ignoring trivial errors between similar sequences, advancing the understanding of coevolutionary models for large protein data sets.

摘要

本研究考察了协同进化信号的统计条件,这些条件允许基于氨基酸序列而非三维结构对蛋白质伴侣进行算法预测。它引入了一种马尔可夫随机模型,该模型基于协同进化信息预测正确蛋白质伴侣的数量。该模型使用正态分布的泊松混合来定义状态概率,关键参数包括蛋白质序列的总数、协同进化信息差距α和方差σ。该模型表明,最大化协同进化信息的算法方法无法有效解析序列数≥100的蛋白质家族中的伴侣。该模型表明,通过忽略相似序列之间的错配,可以提高真阳性(TP)率。这种方法允许在{α, σ}方面区分具有微小误差的优化解和其他退化解。我们的研究结果能够对蛋白质家族进行先验分类,在这类家族中,通过忽略相似序列之间的微小误差可以可靠地预测伴侣,从而推进对大型蛋白质数据集协同进化模型的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9eb0/12042258/4227e6adf014/ci5c00052_0006.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验