Suppr超能文献

基于网络数据的蛋白质功能预测的贝叶斯马尔可夫随机场分析。

Bayesian Markov Random Field analysis for protein function prediction based on network data.

机构信息

Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands.

出版信息

PLoS One. 2010 Feb 24;5(2):e9293. doi: 10.1371/journal.pone.0009293.

Abstract

Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S. cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature.

摘要

蛋白质功能推断是现代生物学最重要的目标之一。为了充分利用现代基因组实验通常产生的大量基因组数据,迫切需要自动化的计算方法来预测蛋白质功能。已建立的方法使用序列或结构相似性来推断功能,但这些类型的数据不足以确定蛋白质作用的生物背景。当前的高通量生物学实验产生了大量关于蛋白质相互作用的数据。这些数据可用于推断相互作用网络,并预测蛋白质参与的生物学过程。在这里,我们使用网络数据(如蛋白质-蛋白质相互作用测量值)开发了一种基于概率的蛋白质功能预测方法。我们通过对模型参数进行同时估计和对蛋白质功能进行预测,采用了一种现有的马尔可夫随机场方法的贝叶斯方法。我们使用自适应马尔可夫链蒙特卡罗算法,与标准的马尔可夫随机场方法相比,该算法可以得到更准确的参数估计,从而提高预测性能。我们使用具有 1622 个蛋白质和 90 个不同抽象级别的基因本体术语的高质量 S. cereviciae 验证网络对我们的方法进行了测试。与其他三种蛋白质功能预测方法相比,我们的方法表现出非常好的预测性能。我们的方法可以直接应用于蛋白质相互作用或共表达网络,也可以扩展到使用多个数据源。我们将我们的方法应用于来自 S. cerevisiae 的物理蛋白质相互作用数据,并使用 340 个基因本体术语,对 1170 个未注释的蛋白质进行了新的预测,并使用可用的文献对预测进行了评估。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f9e/2827541/0e009ef4a376/pone.0009293.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验