贝叶斯推断在基因组数据整合中减少了预测蛋白质-蛋白质相互作用的错误分类率。

Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.

机构信息

Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, United States of America.

出版信息

PLoS Comput Biol. 2011 Jul;7(7):e1002110. doi: 10.1371/journal.pcbi.1002110. Epub 2011 Jul 28.

DOI:10.1371/journal.pcbi.1002110

PMID:21829334

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3145649/

Abstract

Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility.

摘要

蛋白质-蛋白质相互作用 (PPIs) 是大多数基本细胞过程所必需的。人们对重建蛋白质相互作用网络越来越感兴趣。然而，在获得可靠的预测方面存在几个关键的困难。值得注意的是，假阳性率可能高达>80%。由于在单个测试中难以涵盖来自多个数据处理层次的错误，因此从每个生成源进行错误纠正既耗时又效率低下。我们提出了一种新颖的贝叶斯集成方法，称为非参数贝叶斯集成学习 (NBEL)，通过自动对信息量最大的数据源进行加权，同时对信息量较小和有偏差的数据源进行加权，从而降低错误分类率（包括假阳性和假阴性）。广泛的研究表明，NBEL 比经典的朴素贝叶斯对不可靠、易出错和受污染的数据更稳健。在一个大型人类数据集上，我们的 NBEL 方法预测了比朴素贝叶斯更多的蛋白质相互作用。这表明以前的研究可能不仅有大量的假阳性，而且还有假阴性。对具有高质量的两个人类蛋白质相互作用数据集的验证支持了我们的观察结果。我们的实验表明，通过大大减少假阳性和假阴性，计算上预测高通量蛋白质相互作用是可行的。可靠且自动预测大量蛋白质相互作用的能力可能会激发人们使用计算方法来纠正一般数据错误，并可能加速高质量蛋白质相互作用的预测。这样可靠的预测可能为其他研究提供一个坚实的平台，如蛋白质功能预测和蛋白质相互作用在疾病易感性中的作用。

相似文献

Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.贝叶斯推断在基因组数据整合中减少了预测蛋白质-蛋白质相互作用的错误分类率。

PLoS Comput Biol. 2011 Jul;7(7):e1002110. doi: 10.1371/journal.pcbi.1002110. Epub 2011 Jul 28.

Heterogeneous data integration by tree-augmented naïve Bayes for protein-protein interactions prediction.基于树增强朴素贝叶斯的异质数据集成在蛋白质-蛋白质相互作用预测中的应用。

Proteomics. 2013 Jan;13(2):261-8. doi: 10.1002/pmic.201200326. Epub 2012 Dec 3.

Predicting protein-protein interactions between human and hepatitis C virus via an ensemble learning method.通过集成学习方法预测人类与丙型肝炎病毒之间的蛋白质-蛋白质相互作用。

Mol Biosyst. 2014 Dec;10(12):3147-54. doi: 10.1039/c4mb00410h. Epub 2014 Sep 18.

Integrating diverse biological and computational sources for reliable protein-protein interactions.整合多种生物和计算资源以获得可靠的蛋白质-蛋白质相互作用。

BMC Bioinformatics. 2010 Oct 15;11 Suppl 7(Suppl 7):S8. doi: 10.1186/1471-2105-11-S7-S8.

Probabilistic prediction and ranking of human protein-protein interactions.人类蛋白质-蛋白质相互作用的概率预测与排序

BMC Bioinformatics. 2007 Jul 5;8:239. doi: 10.1186/1471-2105-8-239.

Accurate prediction of protein-protein interactions by integrating potential evolutionary information embedded in PSSM profile and discriminative vector machine classifier.通过整合PSSM概况中嵌入的潜在进化信息和判别向量机分类器来准确预测蛋白质-蛋白质相互作用。

Oncotarget. 2017 Apr 4;8(14):23638-23649. doi: 10.18632/oncotarget.15564.

Bayesian inference of protein-protein interactions from biological literature.基于生物文献的蛋白质-蛋白质相互作用的贝叶斯推断

Bioinformatics. 2009 Jun 15;25(12):1536-42. doi: 10.1093/bioinformatics/btp245. Epub 2009 Apr 15.

Bayesian methods for predicting interacting protein pairs using domain information.利用结构域信息预测相互作用蛋白对的贝叶斯方法。

Biometrics. 2007 Sep;63(3):824-33. doi: 10.1111/j.1541-0420.2007.00755.x.

RVMAB: Using the Relevance Vector Machine Model Combined with Average Blocks to Predict the Interactions of Proteins from Protein Sequences.RVMAB：使用相关向量机模型结合平均块从蛋白质序列预测蛋白质相互作用

Int J Mol Sci. 2016 May 18;17(5):757. doi: 10.3390/ijms17050757.

A Two-Stage Geometric Method for Pruning Unreliable Links in Protein-Protein Networks.一种用于修剪蛋白质-蛋白质网络中不可靠链接的两阶段几何方法。

IEEE Trans Nanobioscience. 2015 Jul;14(5):528-34. doi: 10.1109/TNB.2015.2420754. Epub 2015 Apr 8.

引用本文的文献

Identification of copper-related biomarkers and potential molecule mechanism in diabetic nephropathy.鉴定糖尿病肾病相关的铜生物标志物及潜在分子机制

Front Endocrinol (Lausanne). 2022 Oct 18;13:978601. doi: 10.3389/fendo.2022.978601. eCollection 2022.

The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets.2021 年的 STRING 数据库：可定制的蛋白质-蛋白质网络，以及用户上传的基因/测量集的功能特征分析。

Nucleic Acids Res. 2021 Jan 8;49(D1):D605-D612. doi: 10.1093/nar/gkaa1074.

本文引用的文献

Bayesian nonparametric inference on stochastic ordering.关于随机序的贝叶斯非参数推断。

Biometrika. 2008 Dec;95(4):859-874. doi: 10.1093/biomet/asn043. Epub 2008 Nov 3.

A knowledge-driven probabilistic framework for the prediction of protein-protein interaction networks.一种基于知识的概率框架，用于预测蛋白质-蛋白质相互作用网络。

Comput Biol Med. 2010 Mar;40(3):306-17. doi: 10.1016/j.compbiomed.2010.01.002. Epub 2010 Feb 6.

Prediction of human functional genetic networks from heterogeneous data using RVM-based ensemble learning.基于 RVM 集成学习的异质数据人类功能基因网络预测。

Bioinformatics. 2010 Mar 15;26(6):807-13. doi: 10.1093/bioinformatics/btq044. Epub 2010 Feb 4.

Accounting for redundancy when integrating gene interaction databases.在整合基因交互数据库时考虑冗余。

PLoS One. 2009 Oct 22;4(10):e7492. doi: 10.1371/journal.pone.0007492.

Bayesian inference of protein-protein interactions from biological literature.基于生物文献的蛋白质-蛋白质相互作用的贝叶斯推断

Bioinformatics. 2009 Jun 15;25(12):1536-42. doi: 10.1093/bioinformatics/btp245. Epub 2009 Apr 15.

Dynamic modularity in protein interaction networks predicts breast cancer outcome.蛋白质相互作用网络中的动态模块化可预测乳腺癌预后。

Nat Biotechnol. 2009 Feb;27(2):199-204. doi: 10.1038/nbt.1522. Epub 2009 Feb 1.

Precision and recall estimates for two-hybrid screens.双杂交筛选的精确率和召回率估计。

Bioinformatics. 2009 Feb 1;25(3):372-8. doi: 10.1093/bioinformatics/btn640. Epub 2008 Dec 17.

High-quality binary protein interaction map of the yeast interactome network.酵母相互作用组网络的高质量二元蛋白质相互作用图谱。

Science. 2008 Oct 3;322(5898):104-10. doi: 10.1126/science.1158684. Epub 2008 Aug 21.

PIE: an online prediction system for protein-protein interactions from text.PIE：一个用于从文本中预测蛋白质-蛋白质相互作用的在线系统。

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W411-5. doi: 10.1093/nar/gkn281. Epub 2008 May 28.

A mixture of feature experts approach for protein-protein interaction prediction.一种用于蛋白质-蛋白质相互作用预测的特征专家混合方法。

BMC Bioinformatics. 2007;8 Suppl 10(Suppl 10):S6. doi: 10.1186/1471-2105-8-S10-S6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

贝叶斯推断在基因组数据整合中减少了预测蛋白质-蛋白质相互作用的错误分类率。

Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献