• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于自然语言处理的从药物简化分子线性输入规范(SMILES)中提取有意义特征的技术。

An NLP-based technique to extract meaningful features from drug SMILES.

作者信息

Sharma Rahul, Saghapour Ehsan, Chen Jake Y

机构信息

Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA.

出版信息

iScience. 2024 Feb 8;27(3):109127. doi: 10.1016/j.isci.2024.109127. eCollection 2024 Mar 15.

DOI:10.1016/j.isci.2024.109127
PMID:38455979
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10918220/
Abstract

NLP is a well-established field in ML for developing language models that capture the sequence of words in a sentence. Similarly, drug molecule structures can also be represented as sequences using the SMILES notation. However, unlike natural language texts, special characters in drug SMILES have specific meanings and cannot be ignored. We introduce a novel NLP-based method that extracts interpretable sequences and essential features from drug SMILES notation using N-grams. Our method compares these features to Morgan fingerprint bit-vectors using UMAP-based embedding, and we validate its effectiveness through two personalized drug screening (PSD) case studies. Our NLP-based features are sparse and, when combined with gene expressions and disease phenotype features, produce better ML models for PSD. This approach provides a new way to analyze drug molecule structures represented as SMILES notation, which can help accelerate drug discovery efforts. We have also made our method accessible through a Python library.

摘要

自然语言处理(NLP)是机器学习(ML)中一个成熟的领域,用于开发捕捉句子中单词序列的语言模型。同样,药物分子结构也可以使用SMILES符号表示为序列。然而,与自然语言文本不同,药物SMILES中的特殊字符具有特定含义,不能被忽略。我们引入了一种基于NLP的新方法,该方法使用N元语法从药物SMILES符号中提取可解释的序列和基本特征。我们的方法使用基于UMAP的嵌入将这些特征与摩根指纹位向量进行比较,并通过两个个性化药物筛选(PSD)案例研究验证其有效性。我们基于NLP的特征是稀疏的,当与基因表达和疾病表型特征结合时,能为PSD生成更好的ML模型。这种方法为分析以SMILES符号表示的药物分子结构提供了一种新途径,有助于加速药物发现工作。我们还通过一个Python库使我们的方法易于使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bf/10918220/f19d0d207df1/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bf/10918220/79283fc71ea3/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bf/10918220/efedaab9a02e/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bf/10918220/a8bb320070f6/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bf/10918220/d7263d340bac/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bf/10918220/f19d0d207df1/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bf/10918220/79283fc71ea3/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bf/10918220/efedaab9a02e/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bf/10918220/a8bb320070f6/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bf/10918220/d7263d340bac/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bf/10918220/f19d0d207df1/gr4.jpg

相似文献

1
An NLP-based technique to extract meaningful features from drug SMILES.一种基于自然语言处理的从药物简化分子线性输入规范(SMILES)中提取有意义特征的技术。
iScience. 2024 Feb 8;27(3):109127. doi: 10.1016/j.isci.2024.109127. eCollection 2024 Mar 15.
2
The language of proteins: NLP, machine learning & protein sequences.蛋白质的语言:自然语言处理、机器学习与蛋白质序列
Comput Struct Biotechnol J. 2021 Mar 25;19:1750-1758. doi: 10.1016/j.csbj.2021.03.022. eCollection 2021.
3
Application of SMILES Notation Based Optimal Descriptors in Drug Discovery and Design.基于SMILES符号的最优描述符在药物发现与设计中的应用。
Curr Top Med Chem. 2015;15(18):1768-79. doi: 10.2174/1568026615666150506151533.
4
Melting point prediction of organic molecules by deciphering the chemical structure into a natural language.通过将化学结构解析为自然语言来预测有机分子的熔点。
Chem Commun (Camb). 2021 Mar 14;57(21):2633-2636. doi: 10.1039/d0cc07384a. Epub 2021 Feb 15.
5
Convolutional neural network based on SMILES representation of compounds for detecting chemical motif.基于化合物 SMILES 表示的卷积神经网络用于检测化学基序。
BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):526. doi: 10.1186/s12859-018-2523-5.
6
ChemBoost: A Chemical Language Based Approach for Protein - Ligand Binding Affinity Prediction.ChemBoost:一种基于化学语言的蛋白质-配体结合亲和力预测方法。
Mol Inform. 2021 May;40(5):e2000212. doi: 10.1002/minf.202000212. Epub 2020 Dec 14.
7
GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES.GlyLES:从IUPAC缩合式到SMILES式的基于语法的聚糖解析
J Cheminform. 2023 Mar 23;15(1):37. doi: 10.1186/s13321-023-00704-0.
8
A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction.基于SMILES的化合物相似性函数用于药物-靶点相互作用预测的比较研究。
BMC Bioinformatics. 2016 Mar 18;17:128. doi: 10.1186/s12859-016-0977-x.
9
REStLESS: automated translation of glycan sequences from residue-based notation to SMILES and atomic coordinates.REStLESS:基于残基表示的聚糖序列到 SMILES 和原子坐标的自动翻译。
Bioinformatics. 2018 Aug 1;34(15):2679-2681. doi: 10.1093/bioinformatics/bty168.
10
Knowledge-based BERT: a method to extract molecular features like computational chemists.基于知识的 BERT:一种像计算化学家一样提取分子特征的方法。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac131.

引用本文的文献

1
AGRL-DSE: Adaptive Graph Representation Learning on a Heterogeneous Graph for Drug Side Effect Prediction.AGRL-DSE:基于异构图的自适应图表示学习用于药物副作用预测
ACS Omega. 2025 Aug 18;10(34):38753-38765. doi: 10.1021/acsomega.5c04006. eCollection 2025 Sep 2.
2
The future of pharmaceuticals: Artificial intelligence in drug discovery and development.制药的未来:药物研发中的人工智能
J Pharm Anal. 2025 Aug;15(8):101248. doi: 10.1016/j.jpha.2025.101248. Epub 2025 Feb 26.
3
XenoBug: machine learning-based tool to predict pollutant-degrading enzymes from environmental metagenomes.

本文引用的文献

1
A Knowledge-Graph-Based Multimodal Deep Learning Framework for Identifying Drug-Drug Interactions.基于知识图谱的多模态深度学习框架用于识别药物-药物相互作用。
Molecules. 2023 Feb 3;28(3):1490. doi: 10.3390/molecules28031490.
2
TranGRU: focusing on both the local and global information of molecules for molecular property prediction.TranGRU:聚焦分子的局部和全局信息用于分子性质预测。
Appl Intell (Dordr). 2023;53(12):15246-15260. doi: 10.1007/s10489-022-04280-y. Epub 2022 Nov 14.
3
Integrating cell morphology with gene expression and chemical structure to aid mitochondrial toxicity detection.
XenoBug:基于机器学习的工具,用于从环境宏基因组中预测污染物降解酶。
NAR Genom Bioinform. 2025 May 1;7(2):lqaf037. doi: 10.1093/nargab/lqaf037. eCollection 2025 Jun.
4
Computational Tools to Facilitate Early Warning of New Emerging Risk Chemicals.促进新出现的风险化学品早期预警的计算工具。
Toxics. 2024 Oct 12;12(10):736. doi: 10.3390/toxics12100736.
将细胞形态与基因表达和化学结构相结合,以辅助线粒体毒性检测。
Commun Biol. 2022 Aug 23;5(1):858. doi: 10.1038/s42003-022-03763-5.
4
DeepFusion: A deep learning based multi-scale feature fusion method for predicting drug-target interactions.深融合:一种基于深度学习的多尺度特征融合方法,用于预测药物-靶标相互作用。
Methods. 2022 Aug;204:269-277. doi: 10.1016/j.ymeth.2022.02.007. Epub 2022 Feb 24.
5
S2DV: converting SMILES to a drug vector for predicting the activity of anti-HBV small molecules.S2DV:将 SMILES 转换为药物载体,用于预测抗乙肝小分子的活性。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab593.
6
AMDE: a novel attention-mechanism-based multidimensional feature encoder for drug-drug interaction prediction.AMDE:一种用于药物相互作用预测的新型基于注意力机制的多维特征编码器。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab545.
7
A cross-study analysis of drug response prediction in cancer cell lines.一种跨研究分析癌症细胞系中的药物反应预测。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab356.
8
DeepFusionDTA: Drug-Target Binding Affinity Prediction With Information Fusion and Hybrid Deep-Learning Ensemble Model.DeepFusionDTA:基于信息融合和混合深度学习集成模型的药物-靶标结合亲和力预测。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Sep-Oct;19(5):2760-2769. doi: 10.1109/TCBB.2021.3103966. Epub 2022 Oct 10.
9
Novel deep learning-based transcriptome data analysis for drug-drug interaction prediction with an application in diabetes.基于新型深度学习的转录组数据分析在糖尿病药物相互作用预测中的应用
BMC Bioinformatics. 2021 Jun 11;22(1):318. doi: 10.1186/s12859-021-04241-1.
10
A merged molecular representation learning for molecular properties prediction with a web-based service.基于网络服务的分子性质预测的融合分子表示学习。
Sci Rep. 2021 May 26;11(1):11028. doi: 10.1038/s41598-021-90259-7.