• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用潜在空间算法探索分子异质编码器:原子描述符和分子算子

Exploring Molecular Heteroencoders with Latent Space Arithmetic: Atomic Descriptors and Molecular Operators.

作者信息

Gao Xinyue, Baimacheva Natalia, Aires-de-Sousa Joao

机构信息

Faculty of Sciences, Université Paris Cité, 75013 Paris, France.

Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081 Strasbourg, France.

出版信息

Molecules. 2024 Aug 22;29(16):3969. doi: 10.3390/molecules29163969.

DOI:10.3390/molecules29163969
PMID:39203047
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11357237/
Abstract

A variational heteroencoder based on recurrent neural networks, trained with SMILES linear notations of molecular structures, was used to derive the following atomic descriptors: delta latent space vectors (DLSVs) obtained from the original SMILES of the whole molecule and the SMILES of the same molecule with the target atom replaced. Different replacements were explored, namely, changing the atomic element, replacement with a character of the model vocabulary not used in the training set, or the removal of the target atom from the SMILES. Unsupervised mapping of the DLSV descriptors with t-distributed stochastic neighbor embedding (t-SNE) revealed a remarkable clustering according to the atomic element, hybridization, atomic type, and aromaticity. Atomic DLSV descriptors were used to train machine learning (ML) models to predict F NMR chemical shifts. An R of up to 0.89 and mean absolute errors of up to 5.5 ppm were obtained for an independent test set of 1046 molecules with random forests or a gradient-boosting regressor. Intermediate representations from a Transformer model yielded comparable results. Furthermore, DLSVs were applied as molecular operators in the latent space: the DLSV of a halogenation (H→F substitution) was summed to the LSVs of 4135 new molecules with no fluorine atom and decoded into SMILES, yielding 99% of valid SMILES, with 75% of the SMILES incorporating fluorine and 56% of the structures incorporating fluorine with no other structural change.

摘要

基于循环神经网络的变分自编码器,使用分子结构的SMILES线性表示法进行训练,用于推导以下原子描述符:从整个分子的原始SMILES以及目标原子被替换后的同一分子的SMILES中获得的δ潜在空间向量(DLSV)。研究了不同的替换方式,即改变原子元素、用训练集中未使用的模型词汇字符进行替换,或从SMILES中去除目标原子。使用t分布随机邻域嵌入(t-SNE)对DLSV描述符进行无监督映射,结果显示根据原子元素、杂化、原子类型和芳香性有明显的聚类。原子DLSV描述符用于训练机器学习(ML)模型以预测¹⁹F NMR化学位移。对于1046个分子的独立测试集,使用随机森林或梯度提升回归器时,相关系数R高达0.89,平均绝对误差高达5.5 ppm。来自Transformer模型的中间表示产生了可比的结果。此外,DLSV在潜在空间中用作分子算子:将卤化(H→F替换)的DLSV与4135个没有氟原子的新分子的LSV相加,并解码为SMILES,得到99%的有效SMILES,其中75%的SMILES包含氟,56%的结构包含氟且没有其他结构变化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0437/11357237/0291244b75c0/molecules-29-03969-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0437/11357237/e2e81d84ffc2/molecules-29-03969-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0437/11357237/4df3e080d520/molecules-29-03969-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0437/11357237/0291244b75c0/molecules-29-03969-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0437/11357237/e2e81d84ffc2/molecules-29-03969-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0437/11357237/4df3e080d520/molecules-29-03969-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0437/11357237/0291244b75c0/molecules-29-03969-g003.jpg

相似文献

1
Exploring Molecular Heteroencoders with Latent Space Arithmetic: Atomic Descriptors and Molecular Operators.利用潜在空间算法探索分子异质编码器:原子描述符和分子算子
Molecules. 2024 Aug 22;29(16):3969. doi: 10.3390/molecules29163969.
2
Improving Chemical Autoencoder Latent Space and Molecular Generation Diversity with Heteroencoders.用异构图编码器改进化学自动编码器潜在空间和分子生成多样性。
Biomolecules. 2018 Oct 30;8(4):131. doi: 10.3390/biom8040131.
3
Transformer-Based Representation of Organic Molecules for Potential Modeling of Physicochemical Properties.基于 Transformer 的有机分子表示法,用于潜在的物理化学性质建模。
J Chem Inf Model. 2023 Dec 25;63(24):7676-7688. doi: 10.1021/acs.jcim.3c01548. Epub 2023 Dec 7.
4
UnCorrupt SMILES: a novel approach to de novo design.未腐败的SMILES:一种全新的从头设计方法。
J Cheminform. 2023 Feb 14;15(1):22. doi: 10.1186/s13321-023-00696-x.
5
Randomized SMILES strings improve the quality of molecular generative models.随机化的SMILES字符串提高了分子生成模型的质量。
J Cheminform. 2019 Nov 21;11(1):71. doi: 10.1186/s13321-019-0393-0.
6
De Novo Molecule Design by Translating from Reduced Graphs to SMILES.从头设计分子:从简化图到 SMILES 的转换。
J Chem Inf Model. 2019 Mar 25;59(3):1136-1146. doi: 10.1021/acs.jcim.8b00626. Epub 2018 Dec 21.
7
QSPR modelling of dielectric constants of π-conjugated organic compounds by means of the CORAL software.利用CORAL软件对π共轭有机化合物的介电常数进行定量结构-性质关系建模。
SAR QSAR Environ Res. 2014;25(6):507-26. doi: 10.1080/1062936X.2014.899267. Epub 2014 Apr 9.
8
Critical Assessment of Artificial Intelligence Methods for Prediction of hERG Channel Inhibition in the "Big Data" Era.人工智能方法在“大数据”时代预测 hERG 通道抑制的批判性评估。
J Chem Inf Model. 2020 Dec 28;60(12):6007-6019. doi: 10.1021/acs.jcim.0c00884. Epub 2020 Dec 1.
9
Harnessing Shannon entropy-based descriptors in machine learning models to enhance the prediction accuracy of molecular properties.在机器学习模型中利用基于香农熵的描述符来提高分子性质的预测准确性。
J Cheminform. 2023 May 21;15(1):54. doi: 10.1186/s13321-023-00712-0.
10
A BERT-based pretraining model for extracting molecular structural information from a SMILES sequence.一种基于BERT的预训练模型,用于从SMILES序列中提取分子结构信息。
J Cheminform. 2024 Jun 19;16(1):71. doi: 10.1186/s13321-024-00848-7.

引用本文的文献

1
Evaluation of chirality descriptors derived from SMILES heteroencoders.基于SMILES异编码器的手性描述符评估。
J Cheminform. 2025 Aug 31;17(1):137. doi: 10.1186/s13321-025-01080-7.

本文引用的文献

1
Difficulty in chirality recognition for Transformer architectures learning chemical structures from string representations.Transformer架构从字符串表示中学习化学结构时的手性识别困难。
Nat Commun. 2024 Feb 16;15(1):1197. doi: 10.1038/s41467-024-45102-8.
2
Predictive Minisci late stage functionalization with transfer learning.基于迁移学习的 Minisci 晚期预测功能化。
Nat Commun. 2024 Jan 15;15(1):426. doi: 10.1038/s41467-023-42145-1.
3
HyperPCM: Robust Task-Conditioned Modeling of Drug-Target Interactions.HyperPCM:稳健的任务条件化药物-靶标相互作用建模。
J Chem Inf Model. 2024 Apr 8;64(7):2539-2553. doi: 10.1021/acs.jcim.3c01417. Epub 2024 Jan 7.
4
QM assisted ML for F NMR chemical shift prediction.QM 辅助 ML 进行氟 NMR 化学位移预测。
J Comput Aided Mol Des. 2023 Dec 12;38(1):4. doi: 10.1007/s10822-023-00542-0.
5
NMR shift prediction from small data quantities.基于少量数据的核磁共振位移预测。
J Cheminform. 2023 Nov 27;15(1):114. doi: 10.1186/s13321-023-00785-x.
6
Variational autoencoder-based chemical latent space for large molecular structures with 3D complexity.基于变分自编码器的具有三维复杂性的大分子结构化学潜在空间
Commun Chem. 2023 Nov 16;6(1):249. doi: 10.1038/s42004-023-01054-6.
7
Machine Learning to Predict Homolytic Dissociation Energies of C-H Bonds: Calibration of DFT-based Models with Experimental Data.机器学习预测 C-H 键均裂解离能:基于实验数据的 DFT 模型校准。
Mol Inform. 2023 Jan;42(1):e2200193. doi: 10.1002/minf.202200193. Epub 2022 Oct 19.
8
Real-time prediction of H and C chemical shifts with DFT accuracy using a 3D graph neural network.使用3D图神经网络以密度泛函理论(DFT)精度实时预测H和C化学位移。
Chem Sci. 2021 Aug 9;12(36):12012-12026. doi: 10.1039/d1sc03343c. eCollection 2021 Sep 22.
9
Navigating the amino acid sequence space between functional proteins using a deep learning framework.使用深度学习框架探索功能蛋白之间的氨基酸序列空间。
PeerJ Comput Sci. 2021 Sep 17;7:e684. doi: 10.7717/peerj-cs.684. eCollection 2021.
10
Understanding Conformational Entropy in Small Molecules.理解小分子中的构象熵。
J Chem Theory Comput. 2021 Apr 13;17(4):2099-2106. doi: 10.1021/acs.jctc.0c01213. Epub 2021 Mar 24.