一种基于多个 SMILES 的分子表示学习方法，用于分子性质预测。

A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation.

机构信息

Yunnan Minzu University, Kunming, China.

School of Informatics, Xiamen University, Xiamen, China.

出版信息

Comput Intell Neurosci. 2022 Jan 28;2022:8464452. doi: 10.1155/2022/8464452. eCollection 2022.

DOI:10.1155/2022/8464452

PMID:35178082

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8843876/

Abstract

Deep learning has brought a rapid development in the aspect of molecular representation for various tasks, such as molecular property prediction. The prediction of molecular properties is a crucial task in the field of drug discovery for finding specific drugs with good pharmacological activity and pharmacokinetic properties. SMILES string is always used as a kind of character approach in deep neural network models, inspired by natural language processing techniques. However, the deep learning models are hindered by the nonunique nature of the SMILES string. To efficiently learn molecular features along all message paths, in this paper we encode multiple SMILES for every molecule as an automated data augmentation for the prediction of molecular properties, which alleviates the overfitting problem caused by the small amount of data in the datasets of molecular property prediction. As a result, by using the multiple SMILES-based augmentation, we obtained better molecular representation and showed superior performance in the tasks of predicting molecular properties.

摘要

深度学习在分子表示方面带来了快速发展，可应用于各种任务，如分子性质预测。分子性质预测是药物发现领域的一项关键任务，旨在寻找具有良好药理活性和药代动力学性质的特定药物。SMILES 字符串一直被用作深度学习模型中的一种字符方法，受到自然语言处理技术的启发。然而，由于 SMILES 字符串的非唯一性，深度学习模型受到了阻碍。为了有效地学习分子特征沿着所有消息路径，在本文中，我们对每个分子的多个 SMILES 进行编码，作为一种自动数据扩充，用于分子性质的预测，从而缓解了分子性质预测数据集中小数据量引起的过拟合问题。结果表明，通过使用基于多个 SMILES 的扩充，我们得到了更好的分子表示，并在分子性质预测任务中表现出了优异的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/368e/8843876/7bc4066649a2/CIN2022-8464452.001.jpg

相似文献

A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation.

Comput Intell Neurosci. 2022 Jan 28;2022:8464452. doi: 10.1155/2022/8464452. eCollection 2022.

INTransformer: Data augmentation-based contrastive learning by injecting noise into transformer for molecular property prediction.

J Mol Graph Model. 2024 May;128:108703. doi: 10.1016/j.jmgm.2024.108703. Epub 2024 Jan 5.

Learning to SMILES: BAN-based strategies to improve latent representation learning from molecules.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab327.

DL-SMILES#: A Novel Encoding Scheme for Predicting Compound Protein Affinity Using Deep Learning.

Comb Chem High Throughput Screen. 2022;25(4):642-650. doi: 10.2174/1386207324666210219102728.

A merged molecular representation learning for molecular properties prediction with a web-based service.

Sci Rep. 2021 May 26;11(1):11028. doi: 10.1038/s41598-021-90259-7.

MolRoPE-BERT: An enhanced molecular representation with Rotary Position Embedding for molecular property prediction.

J Mol Graph Model. 2023 Jan;118:108344. doi: 10.1016/j.jmgm.2022.108344. Epub 2022 Sep 29.

NoiseMol: A noise-robusted data augmentation via perturbing noise for molecular property prediction.

J Mol Graph Model. 2023 Jun;121:108454. doi: 10.1016/j.jmgm.2023.108454. Epub 2023 Mar 15.

Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration.

Research (Wash D C). 2022 Dec 15;2022:0004. doi: 10.34133/research.0004. eCollection 2022.

TOP: A deep mixture representation learning method for boosting molecular toxicity prediction.

Methods. 2020 Jul 1;179:55-64. doi: 10.1016/j.ymeth.2020.05.013. Epub 2020 May 21.

Accurate Physical Property Predictions via Deep Learning.

Molecules. 2022 Mar 3;27(5):1668. doi: 10.3390/molecules27051668.

引用本文的文献

Advancing Drug Discovery with Enhanced Chemical Understanding via Asymmetric Contrastive Multimodal Learning.

J Chem Inf Model. 2025 Jul 14;65(13):6547-6557. doi: 10.1021/acs.jcim.5c00430. Epub 2025 Jun 23.

Deep Supramolecular Language Processing for Co-Crystal Prediction.

Angew Chem Int Ed Engl. 2025 Jul;64(29):e202507835. doi: 10.1002/anie.202507835. Epub 2025 May 30.

Predicting transcriptional changes induced by molecules with MiTCP.

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf006.

A hitchhiker's guide to deep chemical language processing for bioactivity prediction.

Digit Discov. 2024 Dec 16;4(2):316-325. doi: 10.1039/d4dd00311j. eCollection 2025 Feb 12.

Attribute-guided prototype network for few-shot molecular property prediction.

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae394.

A focus on molecular representation learning for the prediction of chemical properties.

Chem Sci. 2024 Mar 25;15(14):5052-5055. doi: 10.1039/d4sc90043j. eCollection 2024 Apr 3.

An Innovative Inducer of Platelet Production, Isochlorogenic Acid A, Is Uncovered through the Application of Deep Neural Networks.

Biomolecules. 2024 Feb 23;14(3):267. doi: 10.3390/biom14030267.

Molecular Property Prediction by Combining LSTM and GAT.

Biomolecules. 2023 Mar 9;13(3):503. doi: 10.3390/biom13030503.

On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach.

Comput Struct Biotechnol J. 2022 Aug 5;20:4288-4304. doi: 10.1016/j.csbj.2022.07.049. eCollection 2022.

本文引用的文献

Assigning confidence to molecular property prediction.

Expert Opin Drug Discov. 2021 Sep;16(9):1009-1023. doi: 10.1080/17460441.2021.1925247. Epub 2021 Jun 15.

3DMol-Net: Learn 3D Molecular Representation Using Adaptive Graph Convolutional Network Based on Rotation Invariance.

IEEE J Biomed Health Inform. 2022 Oct;26(10):5044-5054. doi: 10.1109/JBHI.2021.3089162. Epub 2022 Oct 4.

Algebraic graph-assisted bidirectional transformers for molecular property prediction.

Nat Commun. 2021 Jun 10;12(1):3521. doi: 10.1038/s41467-021-23720-w.

A spatial-temporal gated attention module for molecular property prediction based on molecular geometry.

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab078.

Molecular property prediction: recent trends in the era of artificial intelligence.

Drug Discov Today Technol. 2019 Dec;32-33:29-36. doi: 10.1016/j.ddtec.2020.05.001. Epub 2020 Jul 1.

A Bayesian graph convolutional network for reliable prediction of molecular properties with uncertainty quantification.

Chem Sci. 2019 Jul 22;10(36):8438-8446. doi: 10.1039/c9sc01992h. eCollection 2019 Sep 28.

A novel molecular representation with BiGRU neural networks for learning atom.

Brief Bioinform. 2020 Dec 1;21(6):2099-2111. doi: 10.1093/bib/bbz125.

MoleculeNet: a benchmark for molecular machine learning.

Chem Sci. 2017 Oct 31;9(2):513-530. doi: 10.1039/c7sc02664a. eCollection 2018 Jan 14.

Computational Modeling of β-Secretase 1 (BACE-1) Inhibitors Using Ligand Based Approaches.

J Chem Inf Model. 2016 Oct 24;56(10):1936-1949. doi: 10.1021/acs.jcim.6b00290. Epub 2016 Oct 10.

Molecular graph convolutions: moving beyond fingerprints.

J Comput Aided Mol Des. 2016 Aug;30(8):595-608. doi: 10.1007/s10822-016-9938-8. Epub 2016 Aug 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于多个 SMILES 的分子表示学习方法，用于分子性质预测。

A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation.

机构信息

Yunnan Minzu University, Kunming, China.

School of Informatics, Xiamen University, Xiamen, China.

出版信息

Comput Intell Neurosci. 2022 Jan 28;2022:8464452. doi: 10.1155/2022/8464452. eCollection 2022.

DOI:10.1155/2022/8464452

PMID:35178082

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8843876/

Abstract

摘要

一种基于多个 SMILES 的分子表示学习方法，用于分子性质预测。

A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

一种基于多个 SMILES 的分子表示学习方法，用于分子性质预测。

A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献