Suppr超能文献

宿主-病原体蛋白质相互作用预测的性能评估问题

Issues in performance evaluation for host-pathogen protein interaction prediction.

作者信息

Abbasi Wajid Arshad, Minhas Fayyaz Ul Amir Afsar

机构信息

1 Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad, Pakistan.

出版信息

J Bioinform Comput Biol. 2016 Jun;14(3):1650011. doi: 10.1142/S0219720016500116. Epub 2016 Jan 14.

Abstract

The study of interactions between host and pathogen proteins is important for understanding the underlying mechanisms of infectious diseases and for developing novel therapeutic solutions. Wet-lab techniques for detecting protein-protein interactions (PPIs) can benefit from computational predictions. Machine learning is one of the computational approaches that can assist biologists by predicting promising PPIs. A number of machine learning based methods for predicting host-pathogen interactions (HPI) have been proposed in the literature. The techniques used for assessing the accuracy of such predictors are of critical importance in this domain. In this paper, we question the effectiveness of K-fold cross-validation for estimating the generalization ability of HPI prediction for proteins with no known interactions. K-fold cross-validation does not model this scenario, and we demonstrate a sizable difference between its performance and the performance of an alternative evaluation scheme called leave one pathogen protein out (LOPO) cross-validation. LOPO is more effective in modeling the real world use of HPI predictors, specifically for cases in which no information about the interacting partners of a pathogen protein is available during training. We also point out that currently used metrics such as areas under the precision-recall or receiver operating characteristic curves are not intuitive to biologists and propose simpler and more directly interpretable metrics for this purpose.

摘要

研究宿主与病原体蛋白质之间的相互作用对于理解传染病的潜在机制以及开发新的治疗方案至关重要。用于检测蛋白质-蛋白质相互作用(PPI)的湿实验室技术可以从计算预测中受益。机器学习是一种可以通过预测有前景的PPI来协助生物学家的计算方法。文献中已经提出了许多基于机器学习的预测宿主-病原体相互作用(HPI)的方法。用于评估此类预测器准确性的技术在该领域至关重要。在本文中,我们质疑K折交叉验证在估计无已知相互作用蛋白质的HPI预测泛化能力方面的有效性。K折交叉验证无法对这种情况进行建模,并且我们证明了其性能与另一种称为留一病原体蛋白质法(LOPO)交叉验证的评估方案的性能之间存在显著差异。LOPO在对HPI预测器的实际应用进行建模方面更有效,特别是对于在训练期间没有关于病原体蛋白质相互作用伙伴的任何信息的情况。我们还指出,目前使用的指标,如精确率-召回率曲线或受试者工作特征曲线下的面积,对生物学家来说并不直观,并为此提出了更简单、更直接可解释的指标。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验