• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过指令序列对分子进行表示。

Representation of Molecules by Sequences of Instructions.

作者信息

Thurnhofer-Hemsi Karl, García-Aguilar Iván, Fernández-Rodriguez José David, López-Rubio Ezequiel

机构信息

ITIS Software, Universidad de Málaga, C/Arquitecto Francisco Peñalosa 18, Málaga 29010, Spain.

Biomedic Research Institute of Málaga, IBIMA Plataforma BIONAND, C/Doctor Miguel Díaz Recio, 28, Málaga 29010, Spain.

出版信息

J Chem Inf Model. 2025 Aug 11;65(15):7936-7955. doi: 10.1021/acs.jcim.5c00354. Epub 2025 Jul 28.

DOI:10.1021/acs.jcim.5c00354
PMID:40720985
Abstract

The processing of chemical information by computational intelligence methods faces the challenge of the structural complexity of molecular graphs. These graphs are not amenable to being represented in a suitable way for such methods. The most popular representation is the SMILES notation standard. However, it comes with some limitations, such as the abundance of nonvalid strings and the fact that similar strings often represent very different molecules. In this work, a completely different approach to chemical nomenclature is presented. A reduced instruction set is defined, and the language of all strings that are sequences of such instructions is considered. Instructions provide the means to incrementally add atoms and modify the connectivity of the chemical bonds of atoms to be inserted. Instructions are carefully crafted to guarantee that all strings of this language are valid, i.e., each string represents a molecule. Moreover, slight changes in a string usually correspond to small modifications in the represented molecule. Therefore, this approach is appropriate for use in state-of-the-art computational intelligence systems for chemical information processing, including deep learning models.

摘要

利用计算智能方法处理化学信息面临着分子图结构复杂性的挑战。这些图难以以适合此类方法的方式进行表示。最流行的表示方式是SMILES符号标准。然而,它存在一些局限性,例如存在大量无效字符串,并且相似的字符串往往代表非常不同的分子。在这项工作中,提出了一种完全不同的化学命名方法。定义了一个精简指令集,并考虑了所有由这些指令序列组成的字符串的语言。指令提供了逐步添加原子并修改要插入原子的化学键连接性的方法。指令经过精心设计,以确保该语言的所有字符串都是有效的,即每个字符串都代表一个分子。此外,字符串的微小变化通常对应于所表示分子的微小修改。因此,这种方法适用于用于化学信息处理的先进计算智能系统,包括深度学习模型。

相似文献

1
Representation of Molecules by Sequences of Instructions.通过指令序列对分子进行表示。
J Chem Inf Model. 2025 Aug 11;65(15):7936-7955. doi: 10.1021/acs.jcim.5c00354. Epub 2025 Jul 28.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Short-Term Memory Impairment短期记忆障碍
4
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
5
Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理(2025年结石病专家共识)
Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.
6
Sexual Harassment and Prevention Training性骚扰与预防培训
7
Interventions to reduce harm from continued tobacco use.减少持续吸烟危害的干预措施。
Cochrane Database Syst Rev. 2016 Oct 13;10(10):CD005231. doi: 10.1002/14651858.CD005231.pub3.
8
Idiopathic (Genetic) Generalized Epilepsy特发性(遗传性)全身性癫痫
9
Multidisciplinary collaborative guidance on the assessment and treatment of patients with Long COVID: A compendium statement.关于长新冠患者评估与治疗的多学科协作指南:一份概要声明
PM R. 2025 Apr 22. doi: 10.1002/pmrj.13397.
10
The clinical effectiveness and cost-effectiveness of enzyme replacement therapy for Gaucher's disease: a systematic review.戈谢病酶替代疗法的临床疗效和成本效益:一项系统评价。
Health Technol Assess. 2006 Jul;10(24):iii-iv, ix-136. doi: 10.3310/hta10240.

本文引用的文献

1
Comparing SMILES and SELFIES tokenization for enhanced chemical language modeling.比较 SMILES 和 SELFIES 标记化以增强化学语言建模。
Sci Rep. 2024 Oct 23;14(1):25016. doi: 10.1038/s41598-024-76440-8.
2
t-SMILES: a fragment-based molecular representation framework for de novo ligand design.t-SMILES:一种用于从头设计配体的基于片段的分子表示框架。
Nat Commun. 2024 Jun 11;15(1):4993. doi: 10.1038/s41467-024-49388-6.
3
Small molecule autoencoders: architecture engineering to optimize latent space utility and sustainability.
小分子自动编码器:用于优化潜在空间效用和可持续性的架构工程
J Cheminform. 2024 Mar 5;16(1):26. doi: 10.1186/s13321-024-00817-0.
4
Evaluating molecular representations in machine learning models for drug response prediction and interpretability.评估机器学习模型中的分子表示在药物反应预测和可解释性方面的应用。
J Integr Bioinform. 2022 Aug 26;19(3). doi: 10.1515/jib-2022-0006. eCollection 2022 Sep 1.
5
A Comprehensive Database for DNA Adductomics.一个用于DNA加合物组学的综合数据库。
Front Chem. 2022 May 27;10:908572. doi: 10.3389/fchem.2022.908572. eCollection 2022.
6
Representation of molecules for drug response prediction.药物反应预测的分子表示。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab393.
7
Convolutional neural networks (CNNs): concepts and applications in pharmacogenomics.卷积神经网络(CNN):在药物基因组学中的概念与应用。
Mol Divers. 2021 Aug;25(3):1569-1584. doi: 10.1007/s11030-021-10225-3. Epub 2021 May 24.
8
Can an InChI for Nano Address the Need for a Simplified Representation of Complex Nanomaterials across Experimental and Nanoinformatics Studies?纳米材料的国际化学标识符(InChI)能否满足在实验研究和纳米信息学研究中对复杂纳米材料进行简化表示的需求?
Nanomaterials (Basel). 2020 Dec 11;10(12):2493. doi: 10.3390/nano10122493.
9
Different molecular enumeration influences in deep learning: an example using aqueous solubility.不同分子计数方法对深度学习的影响:以水溶解度为例。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa092.
10
Tautomer Database: A Comprehensive Resource for Tautomerism Analyses.互变异构体数据库:互变异构分析的综合资源。
J Chem Inf Model. 2020 Mar 23;60(3):1090-1100. doi: 10.1021/acs.jcim.9b01156. Epub 2020 Mar 10.