• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

分子表示的拓扑结构及其对机器学习性能的影响。

The topology of molecular representations and its influence on machine learning performance.

作者信息

Rottach Florian, Schieferdecker Sebastian, Eickhoff Carsten

机构信息

Central Data Science, Boehringer Ingelheim GmbH, Biberach/Riss, Germany.

School of Medicine, University of Tübingen, Tübingen, Germany.

出版信息

J Cheminform. 2025 Jul 21;17(1):109. doi: 10.1186/s13321-025-01045-w.

DOI:10.1186/s13321-025-01045-w
PMID:40691856
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12281805/
Abstract

Advancements in cheminformatics have led to numerous methods for encoding molecules numerically. The choice of molecular representation impacts the accuracy and generalizability of learning algorithms applied to chemical datasets. Designing and selecting the appropriate representation often lacks a systematic approach and follows computationally exhaustive empirical testing. Moreover, research has shown that deep learning models do not substantially outperform traditional approaches across many tasks with no clear explanation for this shortfall. In this work, we present TopoLearn, a model that predicts the effectiveness of representations on datasets based on the topological characteristics of the corresponding feature space. Using interpretability techniques, we find that persistent homology descriptors are linked with the error metrics of trained machine learning models, offering a new method to better understand and select molecular representations.Scientific contribution Our research is the first to establish an empirical connection between the topology of feature spaces and the machine learning performance of molecular representations. In addition, we facilitate future research endeavors by providing open access to our developed model.

摘要

化学信息学的进步催生了多种对分子进行数值编码的方法。分子表示方法的选择会影响应用于化学数据集的学习算法的准确性和通用性。设计和选择合适的表示方法往往缺乏系统的方法,通常需要进行计算量巨大的实证测试。此外,研究表明,在许多任务中,深度学习模型并没有显著优于传统方法,而且对此不足没有明确的解释。在这项工作中,我们提出了TopoLearn,这是一种基于相应特征空间的拓扑特征来预测数据集上表示方法有效性的模型。通过使用可解释性技术,我们发现持久同调描述符与训练后的机器学习模型的误差度量相关联,为更好地理解和选择分子表示提供了一种新方法。科学贡献我们的研究首次在特征空间的拓扑结构与分子表示的机器学习性能之间建立了实证联系。此外,我们通过开放访问我们开发的模型,为未来的研究工作提供便利。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/a29226e4240e/13321_2025_1045_Fig21_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/7c6a0519adba/13321_2025_1045_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/57e7246e906e/13321_2025_1045_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/a323c01f6a1b/13321_2025_1045_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/be977a852d46/13321_2025_1045_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/9101bff40aeb/13321_2025_1045_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/3a2d0388394e/13321_2025_1045_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/ef8ed5cc40ab/13321_2025_1045_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/c1bca0382dec/13321_2025_1045_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/0460204e446b/13321_2025_1045_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/511b66050753/13321_2025_1045_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/9e8ed6b886ae/13321_2025_1045_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/f26e62b88973/13321_2025_1045_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/0e4eef2dd8aa/13321_2025_1045_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/d00322972e94/13321_2025_1045_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/74b6adc363a9/13321_2025_1045_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/ba4c624c148a/13321_2025_1045_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/d341901a7f92/13321_2025_1045_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/3fd8eb334c1a/13321_2025_1045_Fig18_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/ba1dd38841a4/13321_2025_1045_Fig19_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/b5c4e6a07d13/13321_2025_1045_Fig20_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/a29226e4240e/13321_2025_1045_Fig21_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/7c6a0519adba/13321_2025_1045_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/57e7246e906e/13321_2025_1045_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/a323c01f6a1b/13321_2025_1045_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/be977a852d46/13321_2025_1045_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/9101bff40aeb/13321_2025_1045_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/3a2d0388394e/13321_2025_1045_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/ef8ed5cc40ab/13321_2025_1045_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/c1bca0382dec/13321_2025_1045_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/0460204e446b/13321_2025_1045_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/511b66050753/13321_2025_1045_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/9e8ed6b886ae/13321_2025_1045_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/f26e62b88973/13321_2025_1045_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/0e4eef2dd8aa/13321_2025_1045_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/d00322972e94/13321_2025_1045_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/74b6adc363a9/13321_2025_1045_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/ba4c624c148a/13321_2025_1045_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/d341901a7f92/13321_2025_1045_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/3fd8eb334c1a/13321_2025_1045_Fig18_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/ba1dd38841a4/13321_2025_1045_Fig19_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/b5c4e6a07d13/13321_2025_1045_Fig20_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0249/12281805/a29226e4240e/13321_2025_1045_Fig21_HTML.jpg

相似文献

1
The topology of molecular representations and its influence on machine learning performance.分子表示的拓扑结构及其对机器学习性能的影响。
J Cheminform. 2025 Jul 21;17(1):109. doi: 10.1186/s13321-025-01045-w.
2
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
3
The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.成年自闭症患者的就业生活经历:系统检索与综述
Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.
4
Predicting Affinity Through Homology (PATH): Interpretable Binding Affinity Prediction with Persistent Homology.通过同源性预测亲和力(PATH):基于持久同源性的可解释结合亲和力预测
bioRxiv. 2024 Oct 21:2023.11.16.567384. doi: 10.1101/2023.11.16.567384.
5
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
6
A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People With Type 2 Diabetes: Methodology and Validation Study.用于评估、选择和解释2型糖尿病患者心血管疾病结局机器学习模型的责任框架:方法与验证研究
JMIR Med Inform. 2025 Jun 27;13:e66200. doi: 10.2196/66200.
7
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
8
"I Don't Understand Their Sense of Belonging": Exploring How Nonbinary Autistic Adults Experience Gender.“我不理解他们的归属感”:探索非二元性别的自闭症成年人如何体验性别。
Autism Adulthood. 2024 Dec 2;6(4):462-473. doi: 10.1089/aut.2023.0071. eCollection 2024 Dec.
9
Short-Term Memory Impairment短期记忆障碍
10
Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理:一项网络荟萃分析。
Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.

本文引用的文献

1
Machine learning in preclinical drug discovery.机器学习在临床前药物发现中的应用。
Nat Chem Biol. 2024 Aug;20(8):960-973. doi: 10.1038/s41589-024-01679-1. Epub 2024 Jul 19.
2
Effectiveness of molecular fingerprints for exploring the chemical space of natural products.分子指纹图谱在探索天然产物化学空间方面的有效性。
J Cheminform. 2024 Mar 25;16(1):35. doi: 10.1186/s13321-024-00830-3.
3
A systematic study of key elements underlying molecular property prediction.对分子性质预测背后关键要素的系统研究。
Nat Commun. 2023 Oct 13;14(1):6395. doi: 10.1038/s41467-023-41948-6.
4
Prospective Validation of Machine Learning Algorithms for Absorption, Distribution, Metabolism, and Excretion Prediction: An Industrial Perspective.基于工业视角的机器学习算法在吸收、分布、代谢和排泄预测中的前瞻性验证。
J Chem Inf Model. 2023 Jun 12;63(11):3263-3274. doi: 10.1021/acs.jcim.3c00160. Epub 2023 May 22.
5
Computational approaches streamlining drug discovery.计算方法简化药物发现。
Nature. 2023 Apr;616(7958):673-685. doi: 10.1038/s41586-023-05905-z. Epub 2023 Apr 26.
6
Exploring QSAR models for activity-cliff prediction.探索用于活性悬崖预测的定量构效关系模型。
J Cheminform. 2023 Apr 17;15(1):47. doi: 10.1186/s13321-023-00708-w.
7
Chemical language models for de novo drug design: Challenges and opportunities.从头开始设计药物的化学语言模型:挑战与机遇。
Curr Opin Struct Biol. 2023 Apr;79:102527. doi: 10.1016/j.sbi.2023.102527. Epub 2023 Feb 2.
8
Graph neural networks for materials science and chemistry.用于材料科学与化学的图神经网络
Commun Mater. 2022;3(1):93. doi: 10.1038/s43246-022-00315-6. Epub 2022 Nov 26.
9
Exposing the Limitations of Molecular Machine Learning with Activity Cliffs.利用活性悬崖揭示分子机器学习的局限性。
J Chem Inf Model. 2022 Dec 12;62(23):5938-5951. doi: 10.1021/acs.jcim.2c01073. Epub 2022 Dec 1.
10
The (Re)-Evolution of Quantitative Structure-Activity Relationship (QSAR) Studies Propelled by the Surge of Machine Learning Methods.机器学习方法的兴起推动定量构效关系(QSAR)研究的(再)演进
J Chem Inf Model. 2022 Nov 28;62(22):5317-5320. doi: 10.1021/acs.jcim.2c01422.