• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MISATO:基于结构的药物发现的蛋白质-配体复合物的机器学习数据集。

MISATO: machine learning dataset of protein-ligand complexes for structure-based drug discovery.

机构信息

Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany.

TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany.

出版信息

Nat Comput Sci. 2024 May;4(5):367-378. doi: 10.1038/s43588-024-00627-2. Epub 2024 May 10.

DOI:10.1038/s43588-024-00627-2
PMID:38730184
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11136668/
Abstract

Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.

摘要

大型语言模型大大增强了我们对生物学和化学的理解能力,但基于结构的药物发现、量子化学和结构生物学的强大方法仍然很少。大型语言模型迫切需要精确的生物分子-配体相互作用数据集。为了解决这个问题,我们提出了 MISATO 数据集,该数据集结合了小分子的量子力学性质和~20000 个实验蛋白质-配体复合物的相关分子动力学模拟,并对实验数据进行了广泛的验证。从现有的实验结构开始,半经验量子力学被用来系统地改进这些结构。还包括一个包含超过 170μs 的蛋白质-配体复合物在明水环境中分子动力学轨迹的大型集合。我们提供了一些机器学习 (ML) 基线模型的示例,证明了通过使用我们的数据可以提高准确性。我们为 ML 专家提供了一个简单的切入点,以实现下一代药物发现人工智能模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff38/11136668/72bee9bd02e4/43588_2024_627_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff38/11136668/844c9fc88384/43588_2024_627_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff38/11136668/565e25dfa692/43588_2024_627_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff38/11136668/38e887dff62d/43588_2024_627_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff38/11136668/d84070a7fd11/43588_2024_627_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff38/11136668/006a7a57b4cf/43588_2024_627_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff38/11136668/72bee9bd02e4/43588_2024_627_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff38/11136668/844c9fc88384/43588_2024_627_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff38/11136668/565e25dfa692/43588_2024_627_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff38/11136668/38e887dff62d/43588_2024_627_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff38/11136668/d84070a7fd11/43588_2024_627_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff38/11136668/006a7a57b4cf/43588_2024_627_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff38/11136668/72bee9bd02e4/43588_2024_627_Fig6_HTML.jpg

相似文献

1
MISATO: machine learning dataset of protein-ligand complexes for structure-based drug discovery.MISATO:基于结构的药物发现的蛋白质-配体复合物的机器学习数据集。
Nat Comput Sci. 2024 May;4(5):367-378. doi: 10.1038/s43588-024-00627-2. Epub 2024 May 10.
2
What Next for Quantum Mechanics in Structure-Based Drug Discovery?基于结构的药物发现中量子力学的下一步是什么?
Methods Mol Biol. 2020;2114:339-353. doi: 10.1007/978-1-0716-0282-9_20.
3
Hybrid Quantum Mechanics/Molecular Mechanics (QM/MM) Simulation: A Tool for Structure-Based Drug Design and Discovery.混合量子力学/分子力学 (QM/MM) 模拟:一种基于结构的药物设计和发现的工具。
Mini Rev Med Chem. 2022;22(8):1096-1107. doi: 10.2174/1389557521666211007115250.
4
Simulation with quantum mechanics/molecular mechanics for drug discovery.用于药物发现的量子力学/分子力学模拟
Expert Opin Drug Discov. 2015 Oct;10(10):1047-57. doi: 10.1517/17460441.2015.1076389. Epub 2015 Aug 8.
5
Machine learning-accelerated quantum mechanics-based atomistic simulations for industrial applications.机器学习加速的基于量子力学的原子模拟在工业中的应用。
J Comput Aided Mol Des. 2021 Apr;35(4):557-586. doi: 10.1007/s10822-020-00346-6. Epub 2020 Oct 9.
6
Comparison of molecular mechanics, semi-empirical quantum mechanical, and density functional theory methods for scoring protein-ligand interactions.比较分子力学、半经验量子力学和密度泛函理论方法在蛋白质-配体相互作用评分中的应用。
J Phys Chem B. 2013 Jul 11;117(27):8075-84. doi: 10.1021/jp402719k. Epub 2013 Jun 25.
7
Quantitative chemogenomics: machine-learning models of protein-ligand interaction.定量化学生物组学:蛋白质-配体相互作用的机器学习模型。
Curr Top Med Chem. 2011;11(15):1978-93. doi: 10.2174/156802611796391249.
8
High-throughput quantum-mechanics/molecular-mechanics (ONIOM) macromolecular crystallographic refinement with PHENIX/DivCon: the impact of mixed Hamiltonian methods on ligand and protein structure.高通量量子力学/分子力学(ONIOM)与 PHENIX/DivCon 联合的大分子晶体学精修:混合哈密顿方法对配体和蛋白质结构的影响。
Acta Crystallogr D Struct Biol. 2018 Nov 1;74(Pt 11):1063-1077. doi: 10.1107/S2059798318012913. Epub 2018 Oct 29.
9
PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications.PLAS-5k:用于机器学习应用的分子动力学中蛋白质-配体亲和力的数据集。
Sci Data. 2022 Sep 7;9(1):548. doi: 10.1038/s41597-022-01631-9.
10
Machine learning accelerates MD-based binding pose prediction between ligands and proteins.机器学习加速了基于 MD 的配体与蛋白质之间结合构象预测。
Bioinformatics. 2018 Mar 1;34(5):770-778. doi: 10.1093/bioinformatics/btx638.

引用本文的文献

1
Spatio-temporal learning from molecular dynamics simulations for protein-ligand binding affinity prediction.基于分子动力学模拟的时空学习用于蛋白质-配体结合亲和力预测。
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf429.
2
PEGASUS: Prediction of MD-derived protein flexibility from sequence.PEGASUS:从序列预测基于分子动力学的蛋白质柔韧性
Protein Sci. 2025 Aug;34(8):e70221. doi: 10.1002/pro.70221.
3
Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction.Boltz-2:迈向准确高效的结合亲和力预测

本文引用的文献

1
PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications.PLAS-20k:用于机器学习应用的 MD 模拟中蛋白质-配体亲和力的扩展数据集。
Sci Data. 2024 Feb 9;11(1):180. doi: 10.1038/s41597-023-02872-y.
2
Automated discovery of fundamental variables hidden in experimental data.从实验数据中自动发现隐藏的基本变量。
Nat Comput Sci. 2022 Jul;2(7):433-442. doi: 10.1038/s43588-022-00281-6. Epub 2022 Jul 25.
3
PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications.
bioRxiv. 2025 Jun 18:2025.06.14.659707. doi: 10.1101/2025.06.14.659707.
4
Beyond static structures: protein dynamic conformations modeling in the post-AlphaFold era.超越静态结构:后AlphaFold时代的蛋白质动态构象建模
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf340.
5
ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance.ColdstartCPI:基于诱导契合理论指导的具有改进泛化性能的DTI预测模型。
Nat Commun. 2025 Jul 11;16(1):6436. doi: 10.1038/s41467-025-61745-7.
6
Multimeric protein interaction and complex prediction: Structure, dynamics and function.多聚体蛋白质相互作用与复合物预测:结构、动力学与功能
Comput Struct Biotechnol J. 2025 May 16;27:1975-1997. doi: 10.1016/j.csbj.2025.05.009. eCollection 2025.
7
Studying Noncovalent Interactions in Molecular Systems with Machine Learning.利用机器学习研究分子系统中的非共价相互作用。
Chem Rev. 2025 Jun 25;125(12):5776-5829. doi: 10.1021/acs.chemrev.4c00893. Epub 2025 Jun 9.
8
Nearl: extracting dynamic features from molecular dynamics trajectories for machine learning tasks.Nearl:从分子动力学轨迹中提取用于机器学习任务的动态特征。
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf321.
9
MolEM: a unified generative framework for molecular graphs and sequential orders.MolEM:分子图与序列顺序的统一生成框架。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf094.
10
A FAIR-Compliant Management Solution for Molecular Simulation Trajectories.一种符合 FAIR 原则的分子模拟轨迹管理解决方案。
J Chem Inf Model. 2025 Mar 10;65(5):2443-2455. doi: 10.1021/acs.jcim.4c01301. Epub 2025 Feb 20.
PLAS-5k:用于机器学习应用的分子动力学中蛋白质-配体亲和力的数据集。
Sci Data. 2022 Sep 7;9(1):548. doi: 10.1038/s41597-022-01631-9.
4
: An Efficient and Easy to Use Semiempirical Library for C+.: C++ 的高效易用半经验库
J Chem Inf Model. 2022 Aug 22;62(16):3685-3694. doi: 10.1021/acs.jcim.2c00757. Epub 2022 Aug 5.
5
On the Frustration to Predict Binding Affinities from Protein-Ligand Structures with Deep Neural Networks.从蛋白质-配体结构用深度神经网络预测结合亲和力的挫折。
J Med Chem. 2022 Jun 9;65(11):7946-7958. doi: 10.1021/acs.jmedchem.2c00487. Epub 2022 May 24.
6
AI-Driven Synthetic Route Design Incorporated with Retrosynthesis Knowledge.人工智能驱动的合成路线设计与反合成知识相结合。
J Chem Inf Model. 2022 Mar 28;62(6):1357-1367. doi: 10.1021/acs.jcim.1c01074. Epub 2022 Mar 8.
7
Yuel: Improving the Generalizability of Structure-Free Compound-Protein Interaction Prediction.于乐:提高无结构化合物-蛋白质相互作用预测的泛化能力。
J Chem Inf Model. 2022 Feb 14;62(3):463-471. doi: 10.1021/acs.jcim.1c01531. Epub 2022 Feb 1.
8
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
9
RosENet: Improving Binding Affinity Prediction by Leveraging Molecular Mechanics Energies with an Ensemble of 3D Convolutional Neural Networks.RosENet:利用 3D 卷积神经网络集成提高结合亲和力预测的分子力学能量。
J Chem Inf Model. 2020 Jun 22;60(6):2791-2802. doi: 10.1021/acs.jcim.0c00075. Epub 2020 May 26.
10
Robust Atomistic Modeling of Materials, Organometallic, and Biochemical Systems.材料、有机金属和生化系统的强大原子建模。
Angew Chem Int Ed Engl. 2020 Sep 1;59(36):15665-15673. doi: 10.1002/anie.202004239. Epub 2020 May 18.