• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在稀疏数据集上使用随机森林评估基于配体建模的参数。

Evaluating parameters for ligand-based modeling with random forest on sparse data sets.

作者信息

Kensert Alexander, Alvarsson Jonathan, Norinder Ulf, Spjuth Ola

机构信息

Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden.

Unit of Toxicology Sciences, Karolinska Institutet, Swetox, Forskargatan 20, SE-15136, Södertälje, Sweden.

出版信息

J Cheminform. 2018 Oct 11;10(1):49. doi: 10.1186/s13321-018-0304-9.

DOI:10.1186/s13321-018-0304-9
PMID:30306349
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6755600/
Abstract

Ligand-based predictive modeling is widely used to generate predictive models aiding decision making in e.g. drug discovery projects. With growing data sets and requirements on low modeling time comes the necessity to analyze data sets efficiently to support rapid and robust modeling. In this study we analyzed four data sets and studied the efficiency of machine learning methods on sparse data structures, utilizing Morgan fingerprints of different radii and hash sizes, and compared with molecular signatures descriptor of different height. We specifically evaluated the effect these parameters had on modeling time, predictive performance, and memory requirements using two implementations of random forest; Scikit-learn as well as FEST. We also compared with a support vector machine implementation. Our results showed that unhashed fingerprints yield significantly better accuracy than hashed fingerprints ([Formula: see text]), with no pronounced deterioration in modeling time and memory usage. Furthermore, the fast execution and low memory usage of the FEST algorithm suggest that it is a good alternative for large, high dimensional sparse data. Both support vector machines and random forest performed equally well but results indicate that the support vector machine was better at using the extra information from larger values of the Morgan fingerprint's radius.

摘要

基于配体的预测建模被广泛用于生成预测模型,以辅助例如药物发现项目中的决策。随着数据集的不断增长以及对低建模时间的要求,有必要对数据集进行有效分析,以支持快速且稳健的建模。在本研究中,我们分析了四个数据集,并利用不同半径和哈希大小的摩根指纹研究了机器学习方法在稀疏数据结构上的效率,并与不同高度的分子特征描述符进行了比较。我们使用随机森林的两种实现方式(Scikit-learn以及FEST),特别评估了这些参数对建模时间、预测性能和内存需求的影响。我们还与支持向量机的实现方式进行了比较。我们的结果表明,未哈希的指纹产生的准确率明显高于哈希指纹([公式:见正文]),建模时间和内存使用没有明显恶化。此外,FEST算法的快速执行和低内存使用表明,它是处理大型高维稀疏数据的不错选择。支持向量机和随机森林的表现同样出色,但结果表明,支持向量机在利用摩根指纹较大半径值的额外信息方面表现更好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4e7/6755600/2160ef5f103f/13321_2018_304_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4e7/6755600/870f621ec58c/13321_2018_304_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4e7/6755600/648d7ab703c3/13321_2018_304_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4e7/6755600/40cb1e5aa885/13321_2018_304_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4e7/6755600/6e002c4a1489/13321_2018_304_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4e7/6755600/74ec9263f758/13321_2018_304_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4e7/6755600/2160ef5f103f/13321_2018_304_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4e7/6755600/870f621ec58c/13321_2018_304_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4e7/6755600/648d7ab703c3/13321_2018_304_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4e7/6755600/40cb1e5aa885/13321_2018_304_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4e7/6755600/6e002c4a1489/13321_2018_304_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4e7/6755600/74ec9263f758/13321_2018_304_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4e7/6755600/2160ef5f103f/13321_2018_304_Fig6_HTML.jpg

相似文献

1
Evaluating parameters for ligand-based modeling with random forest on sparse data sets.在稀疏数据集上使用随机森林评估基于配体建模的参数。
J Cheminform. 2018 Oct 11;10(1):49. doi: 10.1186/s13321-018-0304-9.
2
Benchmarking the Predictive Power of Ligand Efficiency Indices in QSAR.定量构效关系中配体效率指数预测能力的基准测试
J Chem Inf Model. 2016 Aug 22;56(8):1576-87. doi: 10.1021/acs.jcim.6b00136. Epub 2016 Jul 19.
3
Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets.随机森林和线性模型在基准数据集上的预测性能与可解释性比较
J Chem Inf Model. 2017 Aug 28;57(8):1773-1792. doi: 10.1021/acs.jcim.6b00753. Epub 2017 Aug 2.
4
Large-scale ligand-based predictive modelling using support vector machines.使用支持向量机的基于配体的大规模预测建模。
J Cheminform. 2016 Aug 10;8:39. doi: 10.1186/s13321-016-0151-5. eCollection 2016.
5
Computational models for the classification of mPGES-1 inhibitors with fingerprint descriptors.基于指纹描述符的 mPGES-1 抑制剂分类的计算模型。
Mol Divers. 2017 Aug;21(3):661-675. doi: 10.1007/s11030-017-9743-x. Epub 2017 May 8.
6
Prediction of organic compound aqueous solubility using machine learning: a comparison study of descriptor-based and fingerprints-based models.使用机器学习预测有机化合物的水溶性:基于描述符和基于指纹的模型的比较研究
J Cheminform. 2023 Oct 18;15(1):99. doi: 10.1186/s13321-023-00752-6.
7
CPSign: conformal prediction for cheminformatics modeling.CPSign:用于化学信息学建模的共形预测
J Cheminform. 2024 Jun 28;16(1):75. doi: 10.1186/s13321-024-00870-9.
8
Bioalerts: a python library for the derivation of structural alerts from bioactivity and toxicity data sets.生物警报:一个用于从生物活性和毒性数据集中推导结构警报的Python库。
J Cheminform. 2016 Mar 4;8:13. doi: 10.1186/s13321-016-0125-7. eCollection 2016.
9
Relevance Vector Machines: Sparse Classification Methods for QSAR.相关向量机:定量构效关系的稀疏分类方法
J Chem Inf Model. 2015 Aug 24;55(8):1529-34. doi: 10.1021/acs.jcim.5b00261. Epub 2015 Jul 21.
10
Benchmarking study of parameter variation when using signature fingerprints together with support vector machines.使用签名指纹和支持向量机时参数变化的基准研究。
J Chem Inf Model. 2014 Nov 24;54(11):3211-7. doi: 10.1021/ci500344v. Epub 2014 Oct 28.

引用本文的文献

1
Machine learning prediction of intestinal α-glucosidase inhibitors using a diverse set of ligands: a drug repurposing effort with drugBank database screening.使用多种配体对肠道α-葡萄糖苷酶抑制剂进行机器学习预测:基于DrugBank数据库筛选的药物再利用研究
In Silico Pharmacol. 2025 Jun 25;13(2):95. doi: 10.1007/s40203-025-00384-8. eCollection 2025.
2
Deciphering Molecular Embeddings with Centered Kernel Alignment.用中心核对准解码分子嵌入。
J Chem Inf Model. 2024 Oct 14;64(19):7303-7312. doi: 10.1021/acs.jcim.4c00837. Epub 2024 Sep 25.
3
A novel interpretable machine learning model approach for the prediction of TiO photocatalytic degradation of air contaminants.

本文引用的文献

1
A confidence predictor for logD using conformal regression and a support-vector machine.一种使用共形回归和支持向量机的logD置信度预测器。
J Cheminform. 2018 Apr 3;10(1):17. doi: 10.1186/s13321-018-0271-1.
2
WDL-RF: predicting bioactivities of ligand molecules acting with G protein-coupled receptors by combining weighted deep learning and random forest.WDL-RF:通过结合加权深度学习和随机森林预测与 G 蛋白偶联受体相互作用的配体分子的生物活性。
Bioinformatics. 2018 Jul 1;34(13):2271-2282. doi: 10.1093/bioinformatics/bty070.
3
Ensembles of randomized trees using diverse distributed representations of clinical events.
一种用于预测TiO光催化降解空气污染物的新型可解释机器学习模型方法。
Sci Rep. 2024 Jun 6;14(1):13070. doi: 10.1038/s41598-024-62450-z.
4
A Machine Learning Method for the Quantitative Detection of Adulterated Meat Using a MOS-Based E-Nose.一种基于MOS型电子鼻的掺假肉定量检测机器学习方法。
Foods. 2022 Feb 20;11(4):602. doi: 10.3390/foods11040602.
5
Antibacterial Activity Prediction of Plant Secondary Metabolites Based on a Combined Approach of Graph Clustering and Deep Neural Network.基于图聚类和深度神经网络相结合的方法预测植物次生代谢物的抗菌活性。
Mol Inform. 2022 Jul;41(7):e2100247. doi: 10.1002/minf.202100247. Epub 2022 Jan 28.
6
Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning.应用于大规模生物活性数据集和联邦学习的协同共形预测
J Cheminform. 2021 Oct 2;13(1):77. doi: 10.1186/s13321-021-00555-7.
7
Graph-based machine learning interprets and predicts diagnostic isomer-selective ion-molecule reactions in tandem mass spectrometry.基于图的机器学习可解释并预测串联质谱中诊断性异构体选择性离子-分子反应。
Chem Sci. 2020 Oct 5;11(43):11849-11858. doi: 10.1039/d0sc02530e.
8
Assessing the calibration in toxicological in vitro models with conformal prediction.使用共形预测评估毒理学体外模型中的校准。
J Cheminform. 2021 Apr 29;13(1):35. doi: 10.1186/s13321-021-00511-5.
9
SYBA: Bayesian estimation of synthetic accessibility of organic compounds.SYBA:有机化合物合成可及性的贝叶斯估计
J Cheminform. 2020 May 20;12(1):35. doi: 10.1186/s13321-020-00439-2.
10
DeepSnap-Deep Learning Approach Predicts Progesterone Receptor Antagonist Activity With High Performance.DeepSnap——深度学习方法可高效预测孕激素受体拮抗剂活性。
Front Bioeng Biotechnol. 2020 Jan 22;7:485. doi: 10.3389/fbioe.2019.00485. eCollection 2019.
使用临床事件的多种分布式表示的随机树集成。
BMC Med Inform Decis Mak. 2016 Jul 21;16 Suppl 2(Suppl 2):69. doi: 10.1186/s12911-016-0309-0.
4
Benchmarking the Predictive Power of Ligand Efficiency Indices in QSAR.定量构效关系中配体效率指数预测能力的基准测试
J Chem Inf Model. 2016 Aug 22;56(8):1576-87. doi: 10.1021/acs.jcim.6b00136. Epub 2016 Jul 19.
5
Benchmarking study of parameter variation when using signature fingerprints together with support vector machines.使用签名指纹和支持向量机时参数变化的基准研究。
J Chem Inf Model. 2014 Nov 24;54(11):3211-7. doi: 10.1021/ci500344v. Epub 2014 Oct 28.
6
QSAR investigation of NaV1.7 active compounds using the SVM/Signature approach and the Bioclipse Modeling platform.QSAR 研究使用 SVM/特征方法和 Bioclipse 建模平台的 NaV1.7 活性化合物。
Bioorg Med Chem Lett. 2013 Jan 1;23(1):261-3. doi: 10.1016/j.bmcl.2012.10.102. Epub 2012 Oct 31.
7
Prediction of organ toxicity endpoints by QSAR modeling based on precise chemical-histopathology annotations.基于精准的化学-组织病理学标注的定量构效关系建模预测器官毒性终点。
Chem Biol Drug Des. 2012 Sep;80(3):406-16. doi: 10.1111/j.1747-0285.2012.01411.x. Epub 2012 Jun 27.
8
Integrated decision support for assessing chemical liabilities.用于评估化学责任的综合决策支持。
J Chem Inf Model. 2011 Aug 22;51(8):1840-7. doi: 10.1021/ci200242c. Epub 2011 Aug 5.
9
Predicting drug-induced hepatotoxicity using QSAR and toxicogenomics approaches.应用定量构效关系和毒理学基因组学方法预测药物性肝毒性。
Chem Res Toxicol. 2011 Aug 15;24(8):1251-62. doi: 10.1021/tx200148a. Epub 2011 Jul 21.
10
ADME evaluation in drug discovery. 9. Prediction of oral bioavailability in humans based on molecular properties and structural fingerprints.药物发现中的 ADME 评估。9. 基于分子性质和结构指纹预测人体口服生物利用度。
Mol Pharm. 2011 Jun 6;8(3):841-51. doi: 10.1021/mp100444g. Epub 2011 May 16.