• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

药物发现中机器学习的预测局限性。

Limits of Prediction for Machine Learning in Drug Discovery.

作者信息

von Korff Modest, Sander Thomas

机构信息

Idorsia Pharmaceuticals Ltd., Allschwil, Switzerland.

出版信息

Front Pharmacol. 2022 Mar 10;13:832120. doi: 10.3389/fphar.2022.832120. eCollection 2022.

DOI:10.3389/fphar.2022.832120
PMID:35359835
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8960959/
Abstract

In drug discovery, molecules are optimized towards desired properties. In this context, machine learning is used for extrapolation in drug discovery projects. The limits of extrapolation for regression models are known. However, a systematic analysis of the effectiveness of extrapolation in drug discovery has not yet been performed. In response, this study examined the capabilities of six machine learning algorithms to extrapolate from 243 datasets. The response values calculated from the molecules in the datasets were molecular weight, cLogP, and the number of sp3-atoms. Three experimental set ups were chosen for response values. Shuffled data were used for interpolation, whereas data for extrapolation were sorted from high to low values, and the reverse. Extrapolation with sorted data resulted in much larger prediction errors than extrapolation with shuffled data. Additionally, this study demonstrated that linear machine learning methods are preferable for extrapolation.

摘要

在药物研发中,分子会针对所需特性进行优化。在此背景下,机器学习被用于药物研发项目中的外推。回归模型外推的局限性是已知的。然而,尚未对药物研发中外推的有效性进行系统分析。作为回应,本研究考察了六种机器学习算法从243个数据集进行外推的能力。从数据集中的分子计算出的响应值为分子量、cLogP和sp3原子数。针对响应值选择了三种实验设置。打乱的数据用于内插,而外推的数据则按值从高到低排序,反之亦然。与使用打乱数据的外推相比,使用排序数据的外推导致的预测误差要大得多。此外,本研究表明线性机器学习方法更适合外推。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/abb27ccbfc22/fphar-13-832120-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/2ea145f6c5b2/fphar-13-832120-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/1d70c43de744/fphar-13-832120-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/983ff26ac1fc/fphar-13-832120-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/803eac5ea6db/fphar-13-832120-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/4983a5e31244/fphar-13-832120-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/f89ffdc9edba/fphar-13-832120-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/9e1c4989d492/fphar-13-832120-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/875007eac2d4/fphar-13-832120-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/a793894ccb0e/fphar-13-832120-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/f524845eeda3/fphar-13-832120-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/abb27ccbfc22/fphar-13-832120-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/2ea145f6c5b2/fphar-13-832120-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/1d70c43de744/fphar-13-832120-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/983ff26ac1fc/fphar-13-832120-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/803eac5ea6db/fphar-13-832120-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/4983a5e31244/fphar-13-832120-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/f89ffdc9edba/fphar-13-832120-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/9e1c4989d492/fphar-13-832120-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/875007eac2d4/fphar-13-832120-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/a793894ccb0e/fphar-13-832120-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/f524845eeda3/fphar-13-832120-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/8960959/abb27ccbfc22/fphar-13-832120-g011.jpg

相似文献

1
Limits of Prediction for Machine Learning in Drug Discovery.药物发现中机器学习的预测局限性。
Front Pharmacol. 2022 Mar 10;13:832120. doi: 10.3389/fphar.2022.832120. eCollection 2022.
2
Error Tolerance of Machine Learning Algorithms across Contemporary Biological Targets.机器学习算法在当代生物靶标中的容错性。
Molecules. 2019 Jun 4;24(11):2115. doi: 10.3390/molecules24112115.
3
Interpolation and extrapolation problems of multivariate regression in analytical chemistry: benchmarking the robustness on near-infrared (NIR) spectroscopy data.多元回归分析化学中的内插和外推问题:基于近红外(NIR)光谱数据的稳健性基准测试。
Analyst. 2012 Apr 7;137(7):1604-10. doi: 10.1039/c2an15972d. Epub 2012 Feb 16.
4
Direct Comparison of Total Clearance Prediction: Computational Machine Learning Model versus Bottom-Up Approach Using In Vitro Assay.直接比较总清除率预测:基于计算机器学习模型与基于体外测定的自下而上方法。
Mol Pharm. 2020 Jul 6;17(7):2299-2309. doi: 10.1021/acs.molpharmaceut.9b01294. Epub 2020 Jun 12.
5
A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery.一种基于决策理论的计算药物发现中机器学习算法评估方法。
Bioinformatics. 2019 Nov 1;35(22):4656-4663. doi: 10.1093/bioinformatics/btz293.
6
Bioactivity Comparison across Multiple Machine Learning Algorithms Using over 5000 Datasets for Drug Discovery.利用 5000 多个数据集进行药物发现的多种机器学习算法的生物活性比较。
Mol Pharm. 2021 Jan 4;18(1):403-415. doi: 10.1021/acs.molpharmaceut.0c01013. Epub 2020 Dec 16.
7
Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets.我们是否需要不同的机器学习算法来进行定量构效关系建模?对 16 种机器学习算法在 14 个定量构效关系数据集上的综合评估。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa321.
8
Prediction of Oral Pharmacokinetics Using a Combination of In Silico Descriptors and In Vitro ADME Properties.利用体内外 ADME 特性与计算描述符组合预测口服药代动力学。
Mol Pharm. 2021 Mar 1;18(3):1071-1079. doi: 10.1021/acs.molpharmaceut.0c01009. Epub 2021 Jan 29.
9
Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening.利用迭代筛选从初始失活物中发现高活性分子。
J Chem Inf Model. 2018 Sep 24;58(9):2000-2014. doi: 10.1021/acs.jcim.8b00376. Epub 2018 Sep 10.
10
A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification.八种机器学习算法在十个临床代谢组学数据集上进行二进制分类的广义预测能力的比较评估。
Metabolomics. 2019 Nov 15;15(12):150. doi: 10.1007/s11306-019-1612-4.

引用本文的文献

1
Discovery of Highly Potent BET Inhibitors based on a Tractable Tricyclic Scaffold.基于易处理的三环骨架发现高效的溴结构域和末端外结构域(BET)抑制剂
ACS Med Chem Lett. 2025 Mar 21;16(4):588-595. doi: 10.1021/acsmedchemlett.4c00621. eCollection 2025 Apr 10.
2
Directional Δ Neural Network (DrΔ-Net): A Modular Neural Network Approach to Binding Free Energy Prediction.方向 Δ 神经网络(DrΔ-Net):一种用于结合自由能预测的模块化神经网络方法。
J Chem Inf Model. 2024 Mar 25;64(6):1907-1918. doi: 10.1021/acs.jcim.3c02054. Epub 2024 Mar 12.
3
Editorial: Microfluidics and mass spectrometry in drug discovery and development: from synthesis to evaluation.

本文引用的文献

1
Meta-QSAR: a large-scale application of meta-learning to drug design and discovery.元定量构效关系(Meta-QSAR):元学习在药物设计与发现中的大规模应用。
Mach Learn. 2018;107(1):285-311. doi: 10.1007/s10994-017-5685-x. Epub 2017 Dec 22.
2
Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening.利用迭代筛选从初始失活物中发现高活性分子。
J Chem Inf Model. 2018 Sep 24;58(9):2000-2014. doi: 10.1021/acs.jcim.8b00376. Epub 2018 Sep 10.
3
DataWarrior: an open-source program for chemistry aware data visualization and analysis.
社论:药物发现与开发中的微流控技术和质谱分析:从合成到评估
Front Pharmacol. 2023 May 19;14:1201926. doi: 10.3389/fphar.2023.1201926. eCollection 2023.
DataWarrior:一款用于化学感知数据可视化与分析的开源程序。
J Chem Inf Model. 2015 Feb 23;55(2):460-73. doi: 10.1021/ci500588j. Epub 2015 Feb 2.
4
QSAR modeling: where have you been? Where are you going to?定量构效关系模型:你从何处来?你将往何处去?
J Med Chem. 2014 Jun 26;57(12):4977-5010. doi: 10.1021/jm4004285. Epub 2014 Jan 6.
5
Calculation of molecular lipophilicity: State-of-the-art and comparison of log P methods on more than 96,000 compounds.分子亲脂性的计算:最新技术及对96000多种化合物的log P方法比较
J Pharm Sci. 2009 Mar;98(3):861-93. doi: 10.1002/jps.21494.
6
Bridging chemical and biological space: "target fishing" using 2D and 3D molecular descriptors.连接化学与生物空间:使用二维和三维分子描述符进行“靶点垂钓”
J Med Chem. 2006 Nov 16;49(23):6802-10. doi: 10.1021/jm060902w.
7
Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME.环状指纹:从物理化学到药物代谢动力学应用的灵活分子描述符
IDrugs. 2006 Mar;9(3):199-204.
8
QSAR applicabilty domain estimation by projection of the training set descriptor space: a review.通过训练集描述符空间投影进行定量构效关系适用性域估计:综述
Altern Lab Anim. 2005 Oct;33(5):445-59. doi: 10.1177/026119290503300508.
9
Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52.(定量)构效关系适用性域定义方法的现状。欧洲替代方法验证中心(ECVAM)第52次研讨会的报告与建议
Altern Lab Anim. 2005 Apr;33(2):155-73. doi: 10.1177/026119290503300209.