• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在特征选择工作流程中使用子结构向量嵌入自动预测分子性质

Automatic Prediction of Molecular Properties Using Substructure Vector Embeddings within a Feature Selection Workflow.

作者信息

Jung Son Gyo, Jung Guwon, Cole Jacqueline M

机构信息

Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.

ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.

出版信息

J Chem Inf Model. 2025 Jan 13;65(1):133-152. doi: 10.1021/acs.jcim.4c01862. Epub 2024 Dec 23.

DOI:10.1021/acs.jcim.4c01862
PMID:39714952
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11733926/
Abstract

Machine learning (ML) methods provide a pathway to accurately predict molecular properties, leveraging patterns derived from structure-property relationships within materials databases. This approach holds significant importance in drug discovery and materials design, where the rapid, efficient screening of molecules can accelerate the development of new pharmaceuticals and chemical materials for highly specialized target application. Unsupervised and self-supervised learning methods applied to graph-based or geometric models have garnered considerable traction. More recently, transformer-based language models have emerged as powerful tools. Nevertheless, their application entails considerable computational resources, owing to the need for an extensive pretraining process on a vast corpus of unlabeled chemical data sets. To this end, we present a semisupervised strategy that harnesses substructure vector embeddings in conjunction with a ML-based feature selection workflow to predict various molecular and drug properties. We evaluate the efficacy of our modeling methodology across a diverse range of data sets, encompassing both regression and classification tasks. Our findings demonstrate superior performance compared to most existing state-of-the-art algorithms, while offering advantages in terms of balancing model accuracy with computational requirements. Moreover, our approach provides deeper insights into feature interactions that are essential for model interpretability. A case study is conducted to predict the lipophilicity of chemical molecules, exemplifying the robustness of our strategy. The result underscores the importance of meticulous feature analysis and selection over a mere reliance on predictive modeling with a high degree of algorithmic complexity.

摘要

机器学习(ML)方法提供了一条准确预测分子性质的途径,它利用从材料数据库中的结构-性质关系得出的模式。这种方法在药物发现和材料设计中具有重要意义,在这些领域中,对分子进行快速、高效的筛选可以加速针对高度专业化目标应用的新型药物和化学材料的开发。应用于基于图或几何模型的无监督和自监督学习方法已经获得了相当大的关注。最近,基于Transformer的语言模型已成为强大的工具。然而,由于需要在大量未标记的化学数据集上进行广泛的预训练过程,它们的应用需要大量的计算资源。为此,我们提出了一种半监督策略,该策略将子结构向量嵌入与基于ML的特征选择工作流程相结合,以预测各种分子和药物性质。我们在包括回归和分类任务在内的各种数据集上评估了我们的建模方法的有效性。我们的研究结果表明,与大多数现有的最先进算法相比,我们的方法具有卓越的性能,同时在平衡模型准确性和计算要求方面具有优势。此外,我们的方法为模型可解释性所必需的特征相互作用提供了更深入的见解。我们进行了一个案例研究来预测化学分子的亲脂性,例证了我们策略的稳健性。结果强调了细致的特征分析和选择的重要性,而不仅仅是依赖具有高度算法复杂性的预测建模。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/4267e4331dcf/ci4c01862_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/7a5e5ad9c1f8/ci4c01862_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/9442cbbe914c/ci4c01862_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/d6ce00f45a4e/ci4c01862_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/632f0632b72d/ci4c01862_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/fe031c4d11bc/ci4c01862_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/19f2fb68c280/ci4c01862_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/23311fed3a2e/ci4c01862_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/4267e4331dcf/ci4c01862_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/7a5e5ad9c1f8/ci4c01862_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/9442cbbe914c/ci4c01862_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/d6ce00f45a4e/ci4c01862_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/632f0632b72d/ci4c01862_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/fe031c4d11bc/ci4c01862_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/19f2fb68c280/ci4c01862_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/23311fed3a2e/ci4c01862_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ce/11733926/4267e4331dcf/ci4c01862_0008.jpg

相似文献

1
Automatic Prediction of Molecular Properties Using Substructure Vector Embeddings within a Feature Selection Workflow.在特征选择工作流程中使用子结构向量嵌入自动预测分子性质
J Chem Inf Model. 2025 Jan 13;65(1):133-152. doi: 10.1021/acs.jcim.4c01862. Epub 2024 Dec 23.
2
Automatic Prediction of Band Gaps of Inorganic Materials Using a Gradient Boosted and Statistical Feature Selection Workflow.使用梯度提升和统计特征选择工作流程自动预测无机材料的能带隙。
J Chem Inf Model. 2024 Feb 26;64(4):1187-1200. doi: 10.1021/acs.jcim.3c01897. Epub 2024 Feb 6.
3
Machine-Learning Prediction of Curie Temperature from Chemical Compositions of Ferromagnetic Materials.机器学习预测铁磁材料化学成分的居里温度。
J Chem Inf Model. 2024 Aug 26;64(16):6388-6409. doi: 10.1021/acs.jcim.4c00947. Epub 2024 Aug 7.
4
Drug-Protein Interactions Prediction Models Using Feature Selection and Classification Techniques.基于特征选择和分类技术的药物-蛋白相互作用预测模型。
Curr Drug Metab. 2023;24(12):817-834. doi: 10.2174/0113892002268739231211063718.
5
Predictive Modeling of High-Entropy Alloys and Amorphous Metallic Alloys Using Machine Learning.使用机器学习对高熵合金和非晶态金属合金进行预测建模。
J Chem Inf Model. 2024 Oct 14;64(19):7313-7336. doi: 10.1021/acs.jcim.4c00873. Epub 2024 Oct 1.
6
Self-Supervised Pre-Training via Multi-View Graph Information Bottleneck for Molecular Property Prediction.基于多视图图信息瓶颈的自监督预训练用于分子性质预测
IEEE J Biomed Health Inform. 2024 Dec;28(12):7659-7669. doi: 10.1109/JBHI.2024.3422488. Epub 2024 Dec 5.
7
MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction.MG-BERT:利用无监督原子表示学习进行分子性质预测。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab152.
8
Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow.化学信息学的民主化:使用自动化的KNIME工作流程进行可解释的化学分组
J Cheminform. 2024 Aug 16;16(1):101. doi: 10.1186/s13321-024-00894-1.
9
Positional embeddings and zero-shot learning using BERT for molecular-property prediction.使用BERT进行位置嵌入和零样本学习以预测分子性质
J Cheminform. 2025 Feb 5;17(1):17. doi: 10.1186/s13321-025-00959-9.
10
Artificial intelligence to deep learning: machine intelligence approach for drug discovery.人工智能到深度学习:药物发现的机器智能方法。
Mol Divers. 2021 Aug;25(3):1315-1360. doi: 10.1007/s11030-021-10217-3. Epub 2021 Apr 12.

引用本文的文献

1
AI/ML methodologies and the future-will they be successful in designing the next generation of new chemical entities?人工智能/机器学习方法与未来——它们能否成功设计出下一代新型化学实体?
J Cheminform. 2025 Apr 6;17(1):46. doi: 10.1186/s13321-025-00995-5.

本文引用的文献

1
Predictive Modeling of High-Entropy Alloys and Amorphous Metallic Alloys Using Machine Learning.使用机器学习对高熵合金和非晶态金属合金进行预测建模。
J Chem Inf Model. 2024 Oct 14;64(19):7313-7336. doi: 10.1021/acs.jcim.4c00873. Epub 2024 Oct 1.
2
Machine-Learning Predictions of Critical Temperatures from Chemical Compositions of Superconductors.机器学习预测超导材料化学成分的临界温度。
J Chem Inf Model. 2024 Oct 14;64(19):7349-7375. doi: 10.1021/acs.jcim.4c01137. Epub 2024 Sep 17.
3
Machine-Learning Prediction of Curie Temperature from Chemical Compositions of Ferromagnetic Materials.
机器学习预测铁磁材料化学成分的居里温度。
J Chem Inf Model. 2024 Aug 26;64(16):6388-6409. doi: 10.1021/acs.jcim.4c00947. Epub 2024 Aug 7.
4
Automatic Prediction of Peak Optical Absorption Wavelengths in Molecules Using Convolutional Neural Networks.使用卷积神经网络自动预测分子中的峰值光吸收波长
J Chem Inf Model. 2024 Mar 11;64(5):1486-1501. doi: 10.1021/acs.jcim.3c01792. Epub 2024 Feb 29.
5
Automatic Prediction of Band Gaps of Inorganic Materials Using a Gradient Boosted and Statistical Feature Selection Workflow.使用梯度提升和统计特征选择工作流程自动预测无机材料的能带隙。
J Chem Inf Model. 2024 Feb 26;64(4):1187-1200. doi: 10.1021/acs.jcim.3c01897. Epub 2024 Feb 6.
6
Gradient boosted and statistical feature selection workflow for materials property predictions.用于材料性能预测的梯度提升和统计特征选择工作流程。
J Chem Phys. 2023 Nov 21;159(19). doi: 10.1063/5.0171540.
7
Automatic materials characterization from infrared spectra using convolutional neural networks.使用卷积神经网络从红外光谱中进行自动材料表征。
Chem Sci. 2023 Feb 23;14(13):3600-3609. doi: 10.1039/d2sc05892h. eCollection 2023 Mar 29.
8
Multi-fidelity prediction of molecular optical peaks with deep learning.利用深度学习对分子光学峰进行多保真度预测。
Chem Sci. 2022 Jan 4;13(4):1152-1162. doi: 10.1039/d1sc05677h. eCollection 2022 Jan 26.
9
Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism.利用图注意力机制拓展药物发现中分子表示的边界。
J Med Chem. 2020 Aug 27;63(16):8749-8760. doi: 10.1021/acs.jmedchem.9b00959. Epub 2019 Aug 27.
10
Analyzing Learned Molecular Representations for Property Prediction.分析用于性质预测的学习分子表示。
J Chem Inf Model. 2019 Aug 26;59(8):3370-3388. doi: 10.1021/acs.jcim.9b00237. Epub 2019 Aug 13.