Suppr超能文献

预测蛋白质-蛋白质相互作用的信息评估

Information assessment on predicting protein-protein interactions.

作者信息

Lin Nan, Wu Baolin, Jansen Ronald, Gerstein Mark, Zhao Hongyu

机构信息

Department of Mathematics, Washington University in St. Louis, St. Louis, MO 63130, USA.

出版信息

BMC Bioinformatics. 2004 Oct 18;5:154. doi: 10.1186/1471-2105-5-154.

Abstract

BACKGROUND

Identifying protein-protein interactions is fundamental for understanding the molecular machinery of the cell. Proteome-wide studies of protein-protein interactions are of significant value, but the high-throughput experimental technologies suffer from high rates of both false positive and false negative predictions. In addition to high-throughput experimental data, many diverse types of genomic data can help predict protein-protein interactions, such as mRNA expression, localization, essentiality, and functional annotation. Evaluations of the information contributions from different evidences help to establish more parsimonious models with comparable or better prediction accuracy, and to obtain biological insights of the relationships between protein-protein interactions and other genomic information.

RESULTS

Our assessment is based on the genomic features used in a Bayesian network approach to predict protein-protein interactions genome-wide in yeast. In the special case, when one does not have any missing information about any of the features, our analysis shows that there is a larger information contribution from the functional-classification than from expression correlations or essentiality. We also show that in this case alternative models, such as logistic regression and random forest, may be more effective than Bayesian networks for predicting interactions.

CONCLUSIONS

In the restricted problem posed by the complete-information subset, we identified that the MIPS and Gene Ontology (GO) functional similarity datasets as the dominating information contributors for predicting the protein-protein interactions under the framework proposed by Jansen et al. Random forests based on the MIPS and GO information alone can give highly accurate classifications. In this particular subset of complete information, adding other genomic data does little for improving predictions. We also found that the data discretizations used in the Bayesian methods decreased classification performance.

摘要

背景

识别蛋白质 - 蛋白质相互作用是理解细胞分子机制的基础。全蛋白质组范围内的蛋白质 - 蛋白质相互作用研究具有重要价值,但高通量实验技术存在较高的假阳性和假阴性预测率。除了高通量实验数据外,许多不同类型的基因组数据也有助于预测蛋白质 - 蛋白质相互作用,如mRNA表达、定位、必需性和功能注释。评估不同证据的信息贡献有助于建立具有可比或更高预测准确性的更简约模型,并获得蛋白质 - 蛋白质相互作用与其他基因组信息之间关系的生物学见解。

结果

我们的评估基于贝叶斯网络方法中用于在全基因组范围内预测酵母中蛋白质 - 蛋白质相互作用的基因组特征。在特殊情况下,当对任何特征都没有任何缺失信息时,我们的分析表明,功能分类的信息贡献比表达相关性或必需性更大。我们还表明,在这种情况下,替代模型,如逻辑回归和随机森林,在预测相互作用方面可能比贝叶斯网络更有效。

结论

在由完整信息子集构成的受限问题中,我们发现在Jansen等人提出的框架下,MIPS和基因本体(GO)功能相似性数据集是预测蛋白质 - 蛋白质相互作用的主要信息贡献者。仅基于MIPS和GO信息的随机森林就能给出高度准确的分类。在这个完整信息的特定子集中,添加其他基因组数据对改善预测作用不大。我们还发现贝叶斯方法中使用的数据离散化降低了分类性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/93a8/529436/a06ce6bd440d/1471-2105-5-154-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验