宿主-病原体蛋白质相互作用预测的性能评估问题

Issues in performance evaluation for host-pathogen protein interaction prediction.

作者信息

Abbasi Wajid Arshad, Minhas Fayyaz Ul Amir Afsar

机构信息

1 Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad, Pakistan.

出版信息

J Bioinform Comput Biol. 2016 Jun;14(3):1650011. doi: 10.1142/S0219720016500116. Epub 2016 Jan 14.

DOI:10.1142/S0219720016500116

PMID:26932275

Abstract

The study of interactions between host and pathogen proteins is important for understanding the underlying mechanisms of infectious diseases and for developing novel therapeutic solutions. Wet-lab techniques for detecting protein-protein interactions (PPIs) can benefit from computational predictions. Machine learning is one of the computational approaches that can assist biologists by predicting promising PPIs. A number of machine learning based methods for predicting host-pathogen interactions (HPI) have been proposed in the literature. The techniques used for assessing the accuracy of such predictors are of critical importance in this domain. In this paper, we question the effectiveness of K-fold cross-validation for estimating the generalization ability of HPI prediction for proteins with no known interactions. K-fold cross-validation does not model this scenario, and we demonstrate a sizable difference between its performance and the performance of an alternative evaluation scheme called leave one pathogen protein out (LOPO) cross-validation. LOPO is more effective in modeling the real world use of HPI predictors, specifically for cases in which no information about the interacting partners of a pathogen protein is available during training. We also point out that currently used metrics such as areas under the precision-recall or receiver operating characteristic curves are not intuitive to biologists and propose simpler and more directly interpretable metrics for this purpose.

摘要

研究宿主与病原体蛋白质之间的相互作用对于理解传染病的潜在机制以及开发新的治疗方案至关重要。用于检测蛋白质-蛋白质相互作用（PPI）的湿实验室技术可以从计算预测中受益。机器学习是一种可以通过预测有前景的PPI来协助生物学家的计算方法。文献中已经提出了许多基于机器学习的预测宿主-病原体相互作用（HPI）的方法。用于评估此类预测器准确性的技术在该领域至关重要。在本文中，我们质疑K折交叉验证在估计无已知相互作用蛋白质的HPI预测泛化能力方面的有效性。K折交叉验证无法对这种情况进行建模，并且我们证明了其性能与另一种称为留一病原体蛋白质法（LOPO）交叉验证的评估方案的性能之间存在显著差异。LOPO在对HPI预测器的实际应用进行建模方面更有效，特别是对于在训练期间没有关于病原体蛋白质相互作用伙伴的任何信息的情况。我们还指出，目前使用的指标，如精确率-召回率曲线或受试者工作特征曲线下的面积，对生物学家来说并不直观，并为此提出了更简单、更直接可解释的指标。

相似文献

Issues in performance evaluation for host-pathogen protein interaction prediction.

J Bioinform Comput Biol. 2016 Jun;14(3):1650011. doi: 10.1142/S0219720016500116. Epub 2016 Jan 14.

Training host-pathogen protein-protein interaction predictors.

J Bioinform Comput Biol. 2018 Aug;16(4):1850014. doi: 10.1142/S0219720018500142. Epub 2018 May 29.

Critical assessment and performance improvement of plant-pathogen protein-protein interaction prediction methods.

Brief Bioinform. 2019 Jan 18;20(1):274-287. doi: 10.1093/bib/bbx123.

Systematic evaluation of machine learning methods for identifying human-pathogen protein-protein interactions.

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa068.

Prediction of interactions between viral and host proteins using supervised machine learning methods.

PLoS One. 2014 Nov 6;9(11):e112034. doi: 10.1371/journal.pone.0112034. eCollection 2014.

Predicting protein-protein interactions between human and hepatitis C virus via an ensemble learning method.

Mol Biosyst. 2014 Dec;10(12):3147-54. doi: 10.1039/c4mb00410h. Epub 2014 Sep 18.

Machine-Learning-Based Predictor of Human-Bacteria Protein-Protein Interactions by Incorporating Comprehensive Host-Network Properties.

J Proteome Res. 2019 May 3;18(5):2195-2205. doi: 10.1021/acs.jproteome.9b00074. Epub 2019 Apr 22.

Supervised learning and prediction of physical interactions between human and HIV proteins.

Infect Genet Evol. 2011 Jul;11(5):917-23. doi: 10.1016/j.meegid.2011.02.022. Epub 2011 Mar 5.

A new sequence based encoding for prediction of host-pathogen protein interactions.

Comput Biol Chem. 2019 Feb;78:170-177. doi: 10.1016/j.compbiolchem.2018.12.001. Epub 2018 Dec 5.

Computational prediction of virus-human protein-protein interactions using embedding kernelized heterogeneous data.

Mol Biosyst. 2016 May 24;12(6):1976-86. doi: 10.1039/c6mb00065g.

引用本文的文献

Machine learning methods for protein-protein binding affinity prediction in protein design.

Front Bioinform. 2022 Dec 16;2:1065703. doi: 10.3389/fbinf.2022.1065703. eCollection 2022.

ESIDE: A computationally intelligent method to identify earthworm species (E. fetida) from digital images: Application in taxonomy.

PLoS One. 2021 Sep 16;16(9):e0255674. doi: 10.1371/journal.pone.0255674. eCollection 2021.

COVIDC: An expert system to diagnose COVID-19 and predict its severity using chest CT scans: Application in radiology.

Inform Med Unlocked. 2021;23:100540. doi: 10.1016/j.imu.2021.100540. Epub 2021 Feb 23.

ISLAND: in-silico proteins binding affinity prediction using sequence information.

BioData Min. 2020 Nov 25;13(1):20. doi: 10.1186/s13040-020-00231-w.

Identification and Molecular Characterization of a Pellino Protein in Kuruma Prawn () in Response to White Spot Syndrome Virus and Infection.

Int J Mol Sci. 2020 Feb 13;21(4):1243. doi: 10.3390/ijms21041243.

Learning protein binding affinity using privileged information.

BMC Bioinformatics. 2018 Nov 15;19(1):425. doi: 10.1186/s12859-018-2448-z.

Learned protein embeddings for machine learning.

Bioinformatics. 2018 Aug 1;34(15):2642-2648. doi: 10.1093/bioinformatics/bty178.

Predicting protein-binding regions in RNA using nucleotide profiles and compositions.

BMC Syst Biol. 2017 Mar 14;11(Suppl 2):16. doi: 10.1186/s12918-017-0386-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

宿主-病原体蛋白质相互作用预测的性能评估问题

Issues in performance evaluation for host-pathogen protein interaction prediction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献