酿酒酵母中蛋白质-蛋白质相互作用的多层次机器学习预测

Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

作者信息

Zubek Julian, Tatjewski Marcin, Boniecki Adam, Mnich Maciej, Basu Subhadip, Plewczynski Dariusz

机构信息

Centre of New Technologies, University of Warsaw , Warsaw , Poland ; Institute of Computer Science, Polish Academy of Sciences , Warsaw , Poland.

Faculty of Mathematics, Informatics and Mechanics, University of Warsaw , Warsaw , Poland.

出版信息

PeerJ. 2015 Jul 2;3:e1041. doi: 10.7717/peerj.1041. eCollection 2015.

DOI:10.7717/peerj.1041

PMID:26157620

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4493684/

Abstract

Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).

摘要

准确识别蛋白质-蛋白质相互作用（PPI）是理解蛋白质生物学功能的关键步骤，而蛋白质的生物学功能通常依赖于上下文。许多现有的PPI预测器依赖于蛋白质序列的聚合特征，然而只有少数方法利用了特定残基接触的局部信息。在这项工作中，我们提出了一种用于预测蛋白质-蛋白质相互作用的两阶段机器学习方法。我们从蛋白质数据库（PDB）中可获得的酿酒酵母蛋白质复合物的经过仔细筛选的数据开始。首先，我们基于相互作用和非相互作用序列片段对的残基间距离构建线性描述。其次，我们训练机器学习分类器来预测任意两个短序列片段之间的二元片段相互作用。蛋白质-蛋白质相互作用的最终预测是使用所分析的两种蛋白质的所有可能相互作用序列片段的全对全二维矩阵表示来完成的。一级预测器在微观尺度（即残基水平预测）上的AUC达到0.88。二级预测器通过更复杂的学习范式进一步改善了结果。我们进行了30倍宏观尺度（即蛋白质水平）的交叉验证实验。使用PSIPRED预测的二级结构的二级预测器达到了0.70的精确率、0.68的召回率和0.70的AUC，而其他流行方法提供的结果低于0.6阈值（召回率、精确率、AUC）。我们的结果表明，与其他序列表示相比，多尺度序列特征聚合过程能够将机器学习结果提高10%以上。我们实验管道的准备好的数据集和源代码可从以下网址免费下载：http://zubekj.github.io/mlppi/（开源Python实现，与操作系统无关）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9951/4493684/75ceb42d891c/peerj-03-1041-g001.jpg

相似文献

Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

PeerJ. 2015 Jul 2;3:e1041. doi: 10.7717/peerj.1041. eCollection 2015.

An integration of deep learning with feature embedding for protein-protein interaction prediction.

PeerJ. 2019 Jun 17;7:e7126. doi: 10.7717/peerj.7126. eCollection 2019.

Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set.

BMC Bioinformatics. 2014;15 Suppl 15(Suppl 15):S9. doi: 10.1186/1471-2105-15-S15-S9. Epub 2014 Dec 3.

Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest.

PLoS One. 2015 May 6;10(5):e0125811. doi: 10.1371/journal.pone.0125811. eCollection 2015.

AMS 4.0: consensus prediction of post-translational modifications in protein sequences.

Amino Acids. 2012 Aug;43(2):573-82. doi: 10.1007/s00726-012-1290-2. Epub 2012 May 4.

Prediction of protein-protein interaction sites from weakly homologous template structures using meta-threading and machine learning.

J Mol Recognit. 2015 Jan;28(1):35-48. doi: 10.1002/jmr.2410.

PPI_SVM: prediction of protein-protein interactions using machine learning, domain-domain affinities and frequency tables.

Cell Mol Biol Lett. 2011 Jun;16(2):264-78. doi: 10.2478/s11658-011-0008-x. Epub 2011 Mar 20.

Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier.

Comput Biol Med. 2020 Aug;123:103899. doi: 10.1016/j.compbiomed.2020.103899. Epub 2020 Jul 15.

Machine-learning techniques for the prediction of protein-protein interactions.

J Biosci. 2019 Sep;44(4).

An interpretable machine learning method for homo-trimeric protein interface residue-residue interaction prediction.

Biophys Chem. 2021 Nov;278:106666. doi: 10.1016/j.bpc.2021.106666. Epub 2021 Aug 13.

引用本文的文献

Recent advances in deep learning for protein-protein interaction: a review.

BioData Min. 2025 Jun 16;18(1):43. doi: 10.1186/s13040-025-00457-6.

DL-PPI: a method on prediction of sequenced protein-protein interaction based on deep learning.

BMC Bioinformatics. 2023 Dec 14;24(1):473. doi: 10.1186/s12859-023-05594-5.

How to improve the production of peptidyl compounds in filamentous fungi.

Front Fungal Biol. 2022 Dec 22;3:1085624. doi: 10.3389/ffunb.2022.1085624. eCollection 2022.

Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein-Protein Interaction Networks.

Int J Mol Sci. 2019 Oct 12;20(20):5075. doi: 10.3390/ijms20205075.

Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection.

Sci Rep. 2018 Oct 24;8(1):15688. doi: 10.1038/s41598-018-33911-z.

Prediction of Protein-Protein Interactions by Evidence Combining Methods.

Int J Mol Sci. 2016 Nov 22;17(11):1946. doi: 10.3390/ijms17111946.

本文引用的文献

An empirical study of different approaches for protein classification.

ScientificWorldJournal. 2014;2014:236717. doi: 10.1155/2014/236717. Epub 2014 Jun 15.

Improved multi-level protein-protein interaction prediction with semantic-based regularization.

BMC Bioinformatics. 2014 Apr 12;15:103. doi: 10.1186/1471-2105-15-103.

Ensemble learning prediction of protein-protein interactions using proteins functional annotations.

Mol Biosyst. 2014 Apr;10(4):820-30. doi: 10.1039/c3mb70486f. Epub 2014 Jan 27.

Activities at the Universal Protein Resource (UniProt).

Nucleic Acids Res. 2014 Jan;42(Database issue):D191-8. doi: 10.1093/nar/gkt1140. Epub 2013 Nov 18.

Flaws in evaluation schemes for pair-input computational predictions.

Nat Methods. 2012 Dec;9(12):1134-6. doi: 10.1038/nmeth.2259.

SIFTS: Structure Integration with Function, Taxonomy and Sequences resource.

Nucleic Acids Res. 2013 Jan;41(Database issue):D483-9. doi: 10.1093/nar/gks1258. Epub 2012 Nov 29.

Predicting protein-protein interactions by combing various sequence- derived features into the general form of Chou's Pseudo amino acid composition.

Protein Pept Lett. 2012 May;19(5):492-500. doi: 10.2174/092986612800191080.

A series of PDB related databases for everyday needs.

Nucleic Acids Res. 2011 Jan;39(Database issue):D411-9. doi: 10.1093/nar/gkq1105. Epub 2010 Nov 11.

Predicting the protein-protein interactions using primary structures with predicted protein surface.

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2105-11-S1-S3.

Multi-level learning: improving the prediction of protein, domain and residue interactions by allowing information flow between levels.

BMC Bioinformatics. 2009 Aug 5;10:241. doi: 10.1186/1471-2105-10-241.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

酿酒酵母中蛋白质-蛋白质相互作用的多层次机器学习预测

Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献