• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

有机分子三维自回归生成式机器学习中的结构偏差

Structural Bias in Three-Dimensional Autoregressive Generative Machine Learning of Organic Molecules.

作者信息

Koczor-Benda Zsuzsanna, Gilkes Joe, Bartucca Francesco, Al-Fekaiki Abdulla, Maurer Reinhard J

机构信息

Department of Chemistry, University of Warwick, Coventry CV4 7AL, U.K.

Centre for Doctoral Training in Modelling of Heterogeneous Systems, University of Warwick, Coventry CV4 7AL, U.K.

出版信息

J Chem Inf Model. 2025 Jul 14;65(13):6644-6654. doi: 10.1021/acs.jcim.5c00665. Epub 2025 Jun 24.

DOI:10.1021/acs.jcim.5c00665
PMID:40556385
Abstract

A range of generative machine learning models for the design of novel molecules and materials have been proposed in recent years. Models that can generate three-dimensional structures are particularly suitable for quantum chemistry workflows, enabling direct property prediction. The performance of generative models is typically assessed based on their ability to produce novel, valid, and unique molecules. However, equally important is their ability to learn the prevalence of functional groups and certain chemical moieties in the underlying training data, that is, to faithfully reproduce the chemical space spanned by the training data. Here, we investigate the ability of the autoregressive generative machine learning model G-SchNet to reproduce the chemical space and property distributions of training data sets composed of large, functional organic molecules. We assess the elemental composition, size- and bond-length distributions, as well as the functional group and chemical space distribution of training and generated molecules. By principal component analysis of the chemical space, we find that the model leads to a biased generation that is largely unaffected by the choice of hyperparameters or the training data set distribution, producing molecules that are, on average, less saturated and contain more heteroatoms. Purely aliphatic molecules are mostly absent in the generation. We further investigate generation with functional group constraints and based on composite data sets, which can help to partially remedy the model generation bias. Decision tree models can recognize the generation bias in the models and discriminate between training and generated data, revealing key chemical differences between the two sets. The chemical differences we find affect the distributions of electronic properties such as the HOMO-LUMO gap, which is a common target for functional molecule design.

摘要

近年来,人们提出了一系列用于设计新型分子和材料的生成式机器学习模型。能够生成三维结构的模型特别适用于量子化学工作流程,可实现直接的性质预测。生成模型的性能通常根据其生成新颖、有效和独特分子的能力来评估。然而,同样重要的是它们学习基础训练数据中官能团和某些化学部分的普遍性的能力,即忠实地再现训练数据所跨越的化学空间。在这里,我们研究自回归生成式机器学习模型G-SchNet再现由大型功能性有机分子组成的训练数据集的化学空间和性质分布的能力。我们评估训练分子和生成分子的元素组成、尺寸和键长分布,以及官能团和化学空间分布。通过对化学空间的主成分分析,我们发现该模型导致了一种有偏差的生成,这种偏差在很大程度上不受超参数选择或训练数据集分布的影响,生成的分子平均饱和度较低且含有更多杂原子。生成的分子中几乎没有纯脂肪族分子。我们进一步研究了具有官能团约束的生成以及基于复合数据集的生成,这有助于部分纠正模型生成偏差。决策树模型可以识别模型中的生成偏差,并区分训练数据和生成数据,揭示两组数据之间的关键化学差异。我们发现的化学差异会影响诸如HOMO-LUMO能隙等电子性质的分布,而HOMO-LUMO能隙是功能分子设计的一个常见目标。

相似文献

1
Structural Bias in Three-Dimensional Autoregressive Generative Machine Learning of Organic Molecules.有机分子三维自回归生成式机器学习中的结构偏差
J Chem Inf Model. 2025 Jul 14;65(13):6644-6654. doi: 10.1021/acs.jcim.5c00665. Epub 2025 Jun 24.
2
Short-Term Memory Impairment短期记忆障碍
3
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理(2025年结石病专家共识)
Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.
6
Direct composite resin fillings versus amalgam fillings for permanent posterior teeth.直接复合树脂充填与银汞合金充填用于永久性后牙。
Cochrane Database Syst Rev. 2021 Aug 13;8(8):CD005620. doi: 10.1002/14651858.CD005620.pub3.
7
Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.用于识别下肢溃疡患者外周动脉疾病的自动化设备:证据综合和成本效益分析。
Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.
8
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
9
Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。
Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.
10
Systemic treatments for metastatic cutaneous melanoma.转移性皮肤黑色素瘤的全身治疗
Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.

本文引用的文献

1
High-throughput property-driven generative design of functional organic molecules.高通量特性驱动的功能有机分子生成设计。
Nat Comput Sci. 2023 Feb;3(2):139-148. doi: 10.1038/s43588-022-00391-1. Epub 2023 Feb 6.
2
Machine Learning Interatomic Potentials for Reactive Hydrogen Dynamics at Metal Surfaces Based on Iterative Refinement of Reaction Probabilities.基于反应概率迭代优化的金属表面活性氢动力学机器学习原子间势
J Phys Chem C Nanomater Interfaces. 2023 Dec 4;127(50):24168-24182. doi: 10.1021/acs.jpcc.3c06648. eCollection 2023 Dec 21.
3
Generative Models as an Emerging Paradigm in the Chemical Sciences.
生成模型在化学科学中的新兴范例。
J Am Chem Soc. 2023 Apr 26;145(16):8736-8750. doi: 10.1021/jacs.2c13467. Epub 2023 Apr 13.
4
Molecule Design Using Molecular Generative Models Constrained by Ligand-Protein Interactions.基于配体-蛋白相互作用约束的分子生成模型的分子设计。
J Chem Inf Model. 2022 Jul 25;62(14):3291-3306. doi: 10.1021/acs.jcim.2c00177. Epub 2022 Jul 6.
5
Inverse design of 3d molecular structures with conditional generative neural networks.用条件生成神经网络进行 3D 分子结构的反向设计。
Nat Commun. 2022 Feb 21;13(1):973. doi: 10.1038/s41467-022-28526-y.
6
Detecting mid-infrared light by molecular frequency upconversion in dual-wavelength nanoantennas.双波长纳米天线中的分子频率上转换检测中红外光。
Science. 2021 Dec 3;374(6572):1268-1271. doi: 10.1126/science.abk2593. Epub 2021 Dec 2.
7
Continuous-wave frequency upconversion with a molecular optomechanical nanocavity.连续波频率上转换与分子光机械纳米腔。
Science. 2021 Dec 3;374(6572):1264-1267. doi: 10.1126/science.abk3106. Epub 2021 Dec 2.
8
3D-Scaffold: A Deep Learning Framework to Generate 3D Coordinates of Drug-like Molecules with Desired Scaffolds.3D 支架:一个深度学习框架,用于生成具有所需支架的类药物分子的 3D 坐标。
J Phys Chem B. 2021 Nov 11;125(44):12166-12176. doi: 10.1021/acs.jpcb.1c06437. Epub 2021 Oct 18.
9
Physically inspired deep learning of molecular excitations and photoemission spectra.基于物理启发的分子激发和光发射光谱的深度学习
Chem Sci. 2021 Jun 30;12(32):10755-10764. doi: 10.1039/d1sc01542g. eCollection 2021 Aug 18.
10
Atomic structures and orbital energies of 61,489 crystal-forming organic molecules.61489种形成晶体的有机分子的原子结构和轨道能量
Sci Data. 2020 Feb 18;7(1):58. doi: 10.1038/s41597-020-0385-y.