• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DIPS-Plus:用于界面预测的增强型互作蛋白结构数据库。

DIPS-Plus: The enhanced database of interacting protein structures for interface prediction.

机构信息

University of Missouri, Electrical Engineering & Computer Science, Columbia, MO, 65211, USA.

Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA.

出版信息

Sci Data. 2023 Aug 3;10(1):509. doi: 10.1038/s41597-023-02409-3.

DOI:10.1038/s41597-023-02409-3
PMID:37537186
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10400622/
Abstract

In this work, we expand on a dataset recently introduced for protein interface prediction (PIP), the Database of Interacting Protein Structures (DIPS), to present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for machine learning of protein interfaces. While the original DIPS dataset contains only the Cartesian coordinates for atoms contained in the protein complex along with their types, DIPS-Plus contains multiple residue-level features including surface proximities, half-sphere amino acid compositions, and new profile hidden Markov model (HMM)-based sequence features for each amino acid, providing researchers a curated feature bank for training protein interface prediction methods. We demonstrate through rigorous benchmarks that training an existing state-of-the-art (SOTA) model for PIP on DIPS-Plus yields new SOTA results, surpassing the performance of some of the latest models trained on residue-level and atom-level encodings of protein complexes to date.

摘要

在这项工作中,我们扩展了最近引入的用于蛋白质界面预测(PIP)的数据集,即相互作用蛋白质结构数据库(DIPS),以呈现 DIPS-Plus,这是一个增强的、功能丰富的 42112 个复合物数据集,用于蛋白质界面的机器学习。虽然原始的 DIPS 数据集仅包含包含在蛋白质复合物中的原子的笛卡尔坐标及其类型,但 DIPS-Plus 包含多个残基级别的特征,包括表面接近度、半球氨基酸组成以及每个氨基酸的新基于轮廓隐藏 Markov 模型(HMM)的序列特征,为研究人员提供了经过整理的功能库,用于训练蛋白质界面预测方法。我们通过严格的基准测试证明,在 DIPS-Plus 上训练现有的用于 PIP 的最先进(SOTA)模型可以产生新的 SOTA 结果,超过了迄今为止基于蛋白质复合物残基和原子编码训练的一些最新模型的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/23d606b1da3e/41597_2023_2409_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/17eb0bed9b8f/41597_2023_2409_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/d19a9f51cf3b/41597_2023_2409_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/e8d134a69f29/41597_2023_2409_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/a7896459cf74/41597_2023_2409_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/dc517bd8264a/41597_2023_2409_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/f95a098a5d0a/41597_2023_2409_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/23d606b1da3e/41597_2023_2409_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/17eb0bed9b8f/41597_2023_2409_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/d19a9f51cf3b/41597_2023_2409_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/e8d134a69f29/41597_2023_2409_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/a7896459cf74/41597_2023_2409_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/dc517bd8264a/41597_2023_2409_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/f95a098a5d0a/41597_2023_2409_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b87d/10400622/23d606b1da3e/41597_2023_2409_Fig7_HTML.jpg

相似文献

1
DIPS-Plus: The enhanced database of interacting protein structures for interface prediction.DIPS-Plus:用于界面预测的增强型互作蛋白结构数据库。
Sci Data. 2023 Aug 3;10(1):509. doi: 10.1038/s41597-023-02409-3.
2
Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.基于机器学习的蛋白质-RNA 界面残基预测:现状评估。
BMC Bioinformatics. 2012 May 10;13:89. doi: 10.1186/1471-2105-13-89.
3
A Composite Approach to Protein Tertiary Structure Prediction: Hidden Markov Model Based on Lattice.基于格点的隐马尔可夫模型:蛋白质三级结构预测的综合方法
Bull Math Biol. 2019 Mar;81(3):899-918. doi: 10.1007/s11538-018-00542-4. Epub 2018 Dec 10.
4
Prediction of protein binding sites in protein structures using hidden Markov support vector machine.利用隐马尔可夫支持向量机预测蛋白质结构中的蛋白质结合位点。
BMC Bioinformatics. 2009 Nov 20;10:381. doi: 10.1186/1471-2105-10-381.
5
Amino acid residue doublet propensity in the protein-RNA interface and its application to RNA interface prediction.蛋白质-RNA界面中的氨基酸残基双峰倾向及其在RNA界面预测中的应用。
Nucleic Acids Res. 2006;34(22):6450-60. doi: 10.1093/nar/gkl819. Epub 2006 Nov 27.
6
Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins.利用残基水平和序列轮廓水平的界面倾向用于蛋白质结合位点预测。
BMC Bioinformatics. 2007 May 5;8:147. doi: 10.1186/1471-2105-8-147.
7
PRIDB: a Protein-RNA interface database.PRIDB:一个蛋白质-核糖核酸相互作用界面数据库。
Nucleic Acids Res. 2011 Jan;39(Database issue):D277-82. doi: 10.1093/nar/gkq1108. Epub 2010 Nov 11.
8
Sequence-based protein structure prediction using a reduced state-space hidden Markov model.使用简化状态空间隐马尔可夫模型进行基于序列的蛋白质结构预测。
Comput Biol Med. 2007 Sep;37(9):1211-24. doi: 10.1016/j.compbiomed.2006.10.014. Epub 2006 Dec 11.
9
SidechainNet: An all-atom protein structure dataset for machine learning.侧链网络:用于机器学习的全原子蛋白质结构数据集。
Proteins. 2021 Nov;89(11):1489-1496. doi: 10.1002/prot.26169. Epub 2021 Jul 12.
10
An interpretable machine learning method for homo-trimeric protein interface residue-residue interaction prediction.一种用于同源三聚体蛋白质界面残基-残基相互作用预测的可解释机器学习方法。
Biophys Chem. 2021 Nov;278:106666. doi: 10.1016/j.bpc.2021.106666. Epub 2021 Aug 13.

引用本文的文献

1
AbSet: A Standardized Data Set of Antibody Structures for Machine Learning Applications.AbSet:用于机器学习应用的抗体结构标准化数据集。
J Chem Inf Model. 2025 May 26;65(10):4767-4774. doi: 10.1021/acs.jcim.5c00410. Epub 2025 May 11.
2
Unified Sampling and Ranking for Protein Docking with DFMDock.使用DFMDock进行蛋白质对接的统一采样与排序
bioRxiv. 2024 Sep 28:2024.09.27.615401. doi: 10.1101/2024.09.27.615401.
3
Evaluating Representation Learning on the Protein Structure Universe.评估蛋白质结构全域上的表征学习

本文引用的文献

1
Fast and accurate protein structure search with Foldseek.使用 Foldseek 进行快速准确的蛋白质结构搜索。
Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.
2
flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions.flDPnn:利用无序功能的假定倾向进行准确的固有无序预测。
Nat Commun. 2021 Jul 21;12(1):4438. doi: 10.1038/s41467-021-24773-7.
3
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
ArXiv. 2024 Jun 19:arXiv:2406.13864v1.
4
A variational expectation-maximization framework for balanced multi-scale learning of protein and drug interactions.一种平衡多尺度学习蛋白质和药物相互作用的变分期望最大化框架。
Nat Commun. 2024 May 25;15(1):4476. doi: 10.1038/s41467-024-48801-4.
5
Machine Learning-Guided Protein Engineering.机器学习引导的蛋白质工程
ACS Catal. 2023 Oct 13;13(21):13863-13895. doi: 10.1021/acscatal.3c02743. eCollection 2023 Nov 3.
6
A gated graph transformer for protein complex structure quality assessment and its performance in CASP15.门控图转换器用于蛋白质复合物结构质量评估及其在 CASP15 中的性能。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i308-i317. doi: 10.1093/bioinformatics/btad203.
7
Beyond sequence: Structure-based machine learning.超越序列:基于结构的机器学习。
Comput Struct Biotechnol J. 2022 Dec 29;21:630-643. doi: 10.1016/j.csbj.2022.12.039. eCollection 2023.
8
High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function.用于蛋白质结构和功能基因组规模预测的高性能深度学习工具箱。
Workshop Mach Learn HPC Environ. 2021 Nov;2021:46-57. doi: 10.1109/mlhpc54614.2021.00010. Epub 2021 Dec 27.
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
4
Array programming with NumPy.使用 NumPy 进行数组编程。
Nature. 2020 Sep;585(7825):357-362. doi: 10.1038/s41586-020-2649-2. Epub 2020 Sep 16.
5
DNSS2: Improved ab initio protein secondary structure prediction using advanced deep learning architectures.DNSS2:使用先进深度学习架构改进从头算蛋白质二级结构预测
Proteins. 2021 Feb;89(2):207-217. doi: 10.1002/prot.26007. Epub 2020 Sep 16.
6
Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning.利用几何深度学习破译蛋白质分子表面的相互作用指纹。
Nat Methods. 2020 Feb;17(2):184-192. doi: 10.1038/s41592-019-0666-6. Epub 2019 Dec 9.
7
HH-suite3 for fast remote homology detection and deep protein annotation.HH-suite3 用于快速远程同源检测和深度蛋白质注释。
BMC Bioinformatics. 2019 Sep 14;20(1):473. doi: 10.1186/s12859-019-3019-7.
8
Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold.蛋白质水平的组装使宏基因组样本中蛋白质序列的回收率提高了许多倍。
Nat Methods. 2019 Jul;16(7):603-606. doi: 10.1038/s41592-019-0437-4. Epub 2019 Jun 24.
9
Correlations between secondary structure- and protein-protein interface-mimicry: the interface mimicry hypothesis.二级结构与蛋白质-蛋白质界面模拟之间的相关性:界面模拟假说。
Org Biomol Chem. 2019 Mar 20;17(12):3267-3274. doi: 10.1039/c9ob00204a.
10
BIPSPI: a method for the prediction of partner-specific protein-protein interfaces.BIPSPI:一种预测伴侣特异性蛋白质-蛋白质界面的方法。
Bioinformatics. 2019 Feb 1;35(3):470-477. doi: 10.1093/bioinformatics/bty647.