• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PIPENN:利用神经网络集成从序列预测蛋白质界面

PIPENN: protein interface prediction from sequence with an ensemble of neural nets.

作者信息

Stringer Bas, de Ferrante Hans, Abeln Sanne, Heringa Jaap, Feenstra K Anton, Haydarlou Reza

机构信息

Department of Computer Science, IBIVU-Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands.

出版信息

Bioinformatics. 2022 Apr 12;38(8):2111-2118. doi: 10.1093/bioinformatics/btac071.

DOI:10.1093/bioinformatics/btac071
PMID:35150231
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9004643/
Abstract

MOTIVATION

The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein-protein, protein-nucleotide and protein-small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features.

RESULTS

We constructed a large dataset dubbed BioDL, comprising protein-protein interactions from the PDB, and DNA/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein-protein, 0.823 for protein-nucleotide and 0.842 for protein-small molecule.

AVAILABILITY AND IMPLEMENTATION

Source code and datasets are available at https://github.com/ibivu/pipenn/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质与其他分子之间的相互作用对许多生物和细胞过程至关重要。通过实验鉴定界面残基是一项耗时、成本高且具有挑战性的任务,而蛋白质序列数据却无处不在。因此,多年来已经开发了许多计算和机器学习方法来从序列预测此类界面残基。然而,不同的深度学习(DL)架构和学习策略在蛋白质-蛋白质、蛋白质-核苷酸和蛋白质-小分子界面预测方面的有效性尚未得到详细研究。因此,我们在此使用六种DL架构和各种学习策略以及源自序列的输入特征来探索蛋白质界面残基的预测。

结果

我们构建了一个名为BioDL的大型数据集,其中包括来自PDB的蛋白质-蛋白质相互作用以及来自BioLip数据库的DNA/RNA和小分子相互作用。我们还构建了六种DL架构,并在BioDL基准上对它们进行了评估。这表明没有一种架构在所有情况下都表现最佳。一种结合了所有六种架构的集成架构确实始终能达到峰值预测准确率。我们在Zhang和Kurgan(ZK448)发布的基准集以及我们自己现有的经过整理的同聚和异聚蛋白质相互作用数据集上证实了这些结果。我们基于序列的PIPENN集成预测器在ZK448上的所有相互作用类型上均优于当前基于序列的蛋白质界面预测器的最新技术水平,蛋白质-蛋白质相互作用的AUC-ROC为0.718,蛋白质-核苷酸相互作用的AUC-ROC为0.823,蛋白质-小分子相互作用的AUC-ROC为0.842。

可用性和实现

源代码和数据集可在https://github.com/ibivu/pipenn/获取。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f60/9004643/66e79648c448/btac071f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f60/9004643/8c87ea277836/btac071f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f60/9004643/31b8e2e216cb/btac071f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f60/9004643/6bd350b259c1/btac071f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f60/9004643/66e79648c448/btac071f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f60/9004643/8c87ea277836/btac071f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f60/9004643/31b8e2e216cb/btac071f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f60/9004643/6bd350b259c1/btac071f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f60/9004643/66e79648c448/btac071f4.jpg

相似文献

1
PIPENN: protein interface prediction from sequence with an ensemble of neural nets.PIPENN:利用神经网络集成从序列预测蛋白质界面
Bioinformatics. 2022 Apr 12;38(8):2111-2118. doi: 10.1093/bioinformatics/btac071.
2
Seeing the trees through the forest: sequence-based homo- and heteromeric protein-protein interaction sites prediction using random forest.透过森林看树木:使用随机森林基于序列预测同源和异源多聚体蛋白质-蛋白质相互作用位点
Bioinformatics. 2017 May 15;33(10):1479-1487. doi: 10.1093/bioinformatics/btx005.
3
Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.基于机器学习的蛋白质-RNA 界面残基预测:现状评估。
BMC Bioinformatics. 2012 May 10;13:89. doi: 10.1186/1471-2105-13-89.
4
emPDBA: protein-DNA binding affinity prediction by combining features from binding partners and interface learned with ensemble regression model.emPDBA:通过组合来自结合伴侣和接口的特征,并使用集成回归模型学习来预测蛋白质-DNA 结合亲和力。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad192.
5
DeepDTA: deep drug-target binding affinity prediction.深度 DTA:深度药物-靶标结合亲和力预测。
Bioinformatics. 2018 Sep 1;34(17):i821-i829. doi: 10.1093/bioinformatics/bty593.
6
Scoring protein sequence alignments using deep learning.使用深度学习对蛋白质序列比对进行评分。
Bioinformatics. 2022 May 26;38(11):2988-2995. doi: 10.1093/bioinformatics/btac210.
7
PeNGaRoo, a combined gradient boosting and ensemble learning framework for predicting non-classical secreted proteins.PeNGaRoo,一种组合梯度提升和集成学习框架,用于预测非经典分泌蛋白。
Bioinformatics. 2020 Feb 1;36(3):704-712. doi: 10.1093/bioinformatics/btz629.
8
Transfer learning via multi-scale convolutional neural layers for human-virus protein-protein interaction prediction.基于多尺度卷积神经网络的迁移学习方法在人类-病毒蛋白质相互作用预测中的应用。
Bioinformatics. 2021 Dec 11;37(24):4771-4778. doi: 10.1093/bioinformatics/btab533.
9
BIPSPI: a method for the prediction of partner-specific protein-protein interfaces.BIPSPI:一种预测伴侣特异性蛋白质-蛋白质界面的方法。
Bioinformatics. 2019 Feb 1;35(3):470-477. doi: 10.1093/bioinformatics/bty647.
10
Learning embedding features based on multisense-scaled attention architecture to improve the predictive performance of anticancer peptides.基于多尺度注意力架构学习嵌入特征,以提高抗癌肽的预测性能。
Bioinformatics. 2021 Dec 11;37(24):4684-4693. doi: 10.1093/bioinformatics/btab560.

引用本文的文献

1
Large Context, Deeper Insights: Harnessing Large Language Models for Advancing Protein-Protein Interaction Analysis.大背景,更深刻的见解:利用大语言模型推动蛋白质-蛋白质相互作用分析
Methods Mol Biol. 2025;2941:243-267. doi: 10.1007/978-1-0716-4623-6_15.
2
HSSPPI: hierarchical and spatial-sequential modeling for PPIs prediction.HSSPPI:用于蛋白质-蛋白质相互作用预测的分层和空间序列建模
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf079.
3
PIPENN-EMB ensemble net and protein embeddings generalise protein interface prediction beyond homology.
PIPENN-EMB集成网络和蛋白质嵌入技术将蛋白质界面预测推广到同源性之外。
Sci Rep. 2025 Feb 5;15(1):4391. doi: 10.1038/s41598-025-88445-y.
4
PMSFF: Improved Protein Binding Residues Prediction through Multi-Scale Sequence-Based Feature Fusion Strategy.PMSFF:通过多尺度序列的基于特征融合策略来提高蛋白质结合残基预测。
Biomolecules. 2024 Sep 27;14(10):1220. doi: 10.3390/biom14101220.
5
Prediction of Protein-Protein Interactions Based on Integrating Deep Learning and Feature Fusion.基于深度学习和特征融合的蛋白质-蛋白质相互作用预测。
Int J Mol Sci. 2024 May 27;25(11):5820. doi: 10.3390/ijms25115820.
6
Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review.深度学习在蛋白质-蛋白质相互作用分析中的最新进展:全面综述。
Molecules. 2023 Jul 2;28(13):5169. doi: 10.3390/molecules28135169.
7
Ten quick tips for sequence-based prediction of protein properties using machine learning.使用机器学习进行基于序列的蛋白质性质预测的十个快速技巧。
PLoS Comput Biol. 2022 Dec 1;18(12):e1010669. doi: 10.1371/journal.pcbi.1010669. eCollection 2022 Dec.
8
ProteinGLUE multi-task benchmark suite for self-supervised protein modeling.蛋白质 GLUE 多任务基准套件,用于自监督蛋白质建模。
Sci Rep. 2022 Sep 26;12(1):16047. doi: 10.1038/s41598-022-19608-4.
9
Overview of methods for characterization and visualization of a protein-protein interaction network in a multi-omics integration context.多组学整合背景下蛋白质-蛋白质相互作用网络的表征与可视化方法概述。
Front Mol Biosci. 2022 Sep 8;9:962799. doi: 10.3389/fmolb.2022.962799. eCollection 2022.