• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

分子网络:分子机器学习的一个基准

MoleculeNet: a benchmark for molecular machine learning.

作者信息

Wu Zhenqin, Ramsundar Bharath, Feinberg Evan N, Gomes Joseph, Geniesse Caleb, Pappu Aneesh S, Leswing Karl, Pande Vijay

机构信息

Department of Chemistry , Stanford University , Stanford , CA 94305 , USA . Email:

Department of Computer Science , Stanford University , Stanford , CA 94305 , USA.

出版信息

Chem Sci. 2017 Oct 31;9(2):513-530. doi: 10.1039/c7sc02664a. eCollection 2018 Jan 14.

DOI:10.1039/c7sc02664a
PMID:29629118
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5868307/
Abstract

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

摘要

在过去几年中,分子机器学习发展迅速。改进的方法和更大数据集的出现使机器学习算法能够对分子性质做出越来越准确的预测。然而,由于缺乏用于比较所提出方法有效性的标准基准,算法进展受到限制;大多数新算法在不同数据集上进行基准测试,这使得评估所提出方法的质量具有挑战性。这项工作引入了MoleculeNet,这是一个用于分子机器学习的大规模基准。MoleculeNet精心整理了多个公共数据集,建立了评估指标,并提供了多个先前提出的分子特征化和学习算法的高质量开源实现(作为DeepChem开源库的一部分发布)。MoleculeNet基准测试表明,可学习表示是分子机器学习的强大工具,并且总体上提供了最佳性能。然而,这一结果也有需要注意的地方。在数据稀缺和高度不平衡分类的情况下,可学习表示在处理复杂任务时仍然存在困难。对于量子力学和生物物理数据集,使用物理感知特征化可能比选择特定的学习算法更为重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/51cfcac6dd54/c7sc02664a-f15.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/b85a3040968d/c7sc02664a-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/46be69e109e3/c7sc02664a-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/aaeea9fcef61/c7sc02664a-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/96a59e90adbd/c7sc02664a-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/f300964ddc3c/c7sc02664a-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/5fffd286d53b/c7sc02664a-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/8bb0e88b77e8/c7sc02664a-f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/345368cf3150/c7sc02664a-f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/b3fde3163a2d/c7sc02664a-f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/7742a73f2a10/c7sc02664a-f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/a170c0343359/c7sc02664a-f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/95b52e070aa6/c7sc02664a-f12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/c74a592fee11/c7sc02664a-f13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/ae8919903853/c7sc02664a-f14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/51cfcac6dd54/c7sc02664a-f15.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/b85a3040968d/c7sc02664a-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/46be69e109e3/c7sc02664a-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/aaeea9fcef61/c7sc02664a-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/96a59e90adbd/c7sc02664a-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/f300964ddc3c/c7sc02664a-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/5fffd286d53b/c7sc02664a-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/8bb0e88b77e8/c7sc02664a-f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/345368cf3150/c7sc02664a-f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/b3fde3163a2d/c7sc02664a-f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/7742a73f2a10/c7sc02664a-f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/a170c0343359/c7sc02664a-f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/95b52e070aa6/c7sc02664a-f12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/c74a592fee11/c7sc02664a-f13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/ae8919903853/c7sc02664a-f14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c45/5868307/51cfcac6dd54/c7sc02664a-f15.jpg

相似文献

1
MoleculeNet: a benchmark for molecular machine learning.分子网络:分子机器学习的一个基准
Chem Sci. 2017 Oct 31;9(2):513-530. doi: 10.1039/c7sc02664a. eCollection 2018 Jan 14.
2
PMLB: a large benchmark suite for machine learning evaluation and comparison.PMLB:一个用于机器学习评估和比较的大型基准测试套件。
BioData Min. 2017 Dec 11;10:36. doi: 10.1186/s13040-017-0154-4. eCollection 2017.
3
How to approach machine learning-based prediction of drug/compound-target interactions.如何进行基于机器学习的药物/化合物-靶点相互作用预测。
J Cheminform. 2023 Feb 6;15(1):16. doi: 10.1186/s13321-023-00689-w.
4
A Systematic Evaluation of Supervised Machine Learning Algorithms for Cell Phenotype Classification Using Single-Cell RNA Sequencing Data.使用单细胞RNA测序数据对用于细胞表型分类的监督机器学习算法的系统评估
Front Genet. 2022 Feb 23;13:836798. doi: 10.3389/fgene.2022.836798. eCollection 2022.
5
Gap-filling approaches for eddy covariance methane fluxes: A comparison of three machine learning algorithms and a traditional method with principal component analysis.涡度相关甲烷通量的填补方法:三种机器学习算法和一种传统方法与主成分分析的比较。
Glob Chang Biol. 2020 Mar;26(3):1499-1518. doi: 10.1111/gcb.14845. Epub 2019 Oct 21.
6
TrimNet: learning molecular representation from triplet messages for biomedicine.TrimNet:从三重消息中学习生物医学的分子表示。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa266.
7
A comprehensive comparison of molecular feature representations for use in predictive modeling.综合比较用于预测建模的分子特征表示。
Comput Biol Med. 2021 Mar;130:104197. doi: 10.1016/j.compbiomed.2020.104197. Epub 2021 Jan 9.
8
Benchmark datasets incorporating diverse tasks, sample sizes, material systems, and data heterogeneity for materials informatics.用于材料信息学的基准数据集,包含多样的任务、样本大小、材料系统和数据异质性。
Data Brief. 2021 Jul 13;37:107262. doi: 10.1016/j.dib.2021.107262. eCollection 2021 Aug.
9
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
10
Inverse free reduced universum twin support vector machine for imbalanced data classification.用于不平衡数据分类的逆自由约简全域孪生支持向量机
Neural Netw. 2023 Jan;157:125-135. doi: 10.1016/j.neunet.2022.10.003. Epub 2022 Oct 15.

引用本文的文献

1
MetaboGNN: predicting liver metabolic stability with graph neural networks and cross-species data.代谢物图神经网络(MetaboGNN):利用图神经网络和跨物种数据预测肝脏代谢稳定性
J Cheminform. 2025 Sep 3;17(1):140. doi: 10.1186/s13321-025-01089-y.
2
FusionCLM: enhanced molecular property prediction via knowledge fusion of chemical language models.FusionCLM:通过化学语言模型的知识融合增强分子性质预测
J Cheminform. 2025 Aug 29;17(1):133. doi: 10.1186/s13321-025-01073-6.
3
Systematic benchmarking of 13 AI methods for predicting cyclic peptide membrane permeability.

本文引用的文献

1
Improving the accuracy of Møller-Plesset perturbation theory with neural networks.利用神经网络提高 Møller-Plesset 微扰理论的准确性。
J Chem Phys. 2017 Oct 28;147(16):161725. doi: 10.1063/1.4986081.
2
Is Multitask Deep Learning Practical for Pharma?多任务深度学习对制药行业是否实用?
J Chem Inf Model. 2017 Aug 28;57(8):2068-2076. doi: 10.1021/acs.jcim.7b00146. Epub 2017 Aug 1.
3
Computational Modeling of β-Secretase 1 (BACE-1) Inhibitors Using Ligand Based Approaches.基于配体方法的β-分泌酶1(BACE-1)抑制剂的计算建模
用于预测环肽膜通透性的13种人工智能方法的系统基准测试。
J Cheminform. 2025 Aug 28;17(1):129. doi: 10.1186/s13321-025-01083-4.
4
TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining.TapWeight:用于任务自适应预训练的重新加权预训练目标
Transact Mach Learn Res. 2025 Jun;2025.
5
Learnable Filters for Geometric Scattering Modules.用于几何散射模块的可学习滤波器。
IEEE Trans Signal Process. 2024;72:2939-2952. doi: 10.1109/tsp.2024.3378001. Epub 2024 Mar 18.
6
Spatio-temporal learning from molecular dynamics simulations for protein-ligand binding affinity prediction.基于分子动力学模拟的时空学习用于蛋白质-配体结合亲和力预测。
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf429.
7
Improving drug-induced liver injury prediction using graph neural networks with augmented graph features from molecular optimisation.利用具有分子优化增强图特征的图神经网络改善药物性肝损伤预测。
J Cheminform. 2025 Aug 18;17(1):124. doi: 10.1186/s13321-025-01068-3.
8
Molecular Merged Hypergraph Neural Network for Explainable Solvation Gibbs Free Energy Prediction.用于可解释溶剂化吉布斯自由能预测的分子合并超图神经网络
Research (Wash D C). 2025 Aug 15;8:0740. doi: 10.34133/research.0740. eCollection 2025.
9
Pushing the boundaries of few-shot learning for low-data drug discovery with a Bayesian meta-learning hypernetwork framework.利用贝叶斯元学习超网络框架拓展少样本学习在低数据药物发现中的边界。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf408.
10
DFusMol: predicting molecular properties based on dual-channel attention.DFusMol:基于双通道注意力预测分子性质。
Front Mol Biosci. 2025 Jul 30;12:1623620. doi: 10.3389/fmolb.2025.1623620. eCollection 2025.
J Chem Inf Model. 2016 Oct 24;56(10):1936-1949. doi: 10.1021/acs.jcim.6b00290. Epub 2016 Oct 10.
4
A Data-Driven Approach to Predicting Successes and Failures of Clinical Trials.基于数据驱动的临床试验成败预测方法。
Cell Chem Biol. 2016 Oct 20;23(10):1294-1301. doi: 10.1016/j.chembiol.2016.07.023. Epub 2016 Sep 15.
5
Molecular graph convolutions: moving beyond fingerprints.分子图卷积:超越指纹图谱
J Comput Aided Mol Des. 2016 Aug;30(8):595-608. doi: 10.1007/s10822-016-9938-8. Epub 2016 Aug 24.
6
ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology.ToxCast化学图谱:为21世纪毒理学铺平道路。
Chem Res Toxicol. 2016 Aug 15;29(8):1225-51. doi: 10.1021/acs.chemrestox.6b00135. Epub 2016 Jul 20.
7
The Cambridge Structural Database.剑桥结构数据库。
Acta Crystallogr B Struct Sci Cryst Eng Mater. 2016 Apr;72(Pt 2):171-9. doi: 10.1107/S2052520616003954. Epub 2016 Apr 1.
8
The SIDER database of drugs and side effects.药物与副作用的SIDER数据库。
Nucleic Acids Res. 2016 Jan 4;44(D1):D1075-9. doi: 10.1093/nar/gkv1075. Epub 2015 Oct 19.
9
Electronic spectra from TDDFT and machine learning in chemical space.化学空间中基于时域密度泛函理论(TDDFT)和机器学习的电子光谱
J Chem Phys. 2015 Aug 28;143(8):084111. doi: 10.1063/1.4928757.
10
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.