• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

STOUT V2.0:使用变压器模型将SMILES转换为IUPAC名称。

STOUT V2.0: SMILES to IUPAC name conversion using transformer models.

作者信息

Rajan Kohulan, Zielesny Achim, Steinbeck Christoph

机构信息

Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, Lessingstr. 8, 07743, Jena, Germany.

Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, 45665, Recklinghausen, Germany.

出版信息

J Cheminform. 2024 Dec 27;16(1):146. doi: 10.1186/s13321-024-00941-x.

DOI:10.1186/s13321-024-00941-x
PMID:39731139
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11673719/
Abstract

Naming chemical compounds systematically is a complex task governed by a set of rules established by the International Union of Pure and Applied Chemistry (IUPAC). These rules are universal and widely accepted by chemists worldwide, but their complexity makes it challenging for individuals to consistently apply them accurately. A translation method can be employed to address this challenge. Accurate translation of chemical compounds from SMILES notation into their corresponding IUPAC names is crucial, as it can significantly streamline the laborious process of naming chemical structures. Here, we present STOUT (SMILES-TO-IUPAC-name translator) V2, which addresses this challenge by introducing a transformer-based model that translates string representations of chemical structures into IUPAC names. Trained on a dataset of nearly 1 billion SMILES strings and their corresponding IUPAC names, STOUT V2 demonstrates exceptional accuracy in generating IUPAC names, even for complex chemical structures. The model's ability to capture intricate patterns and relationships within chemical structures enables it to generate precise and standardised IUPAC names. While established deterministic algorithms remain the gold standard for systematic chemical naming, our work, enabled by access to OpenEye's Lexichem software through an academic license, demonstrates the potential of neural approaches to complement existing tools in chemical nomenclature.Scientific contribution STOUT V2, built upon transformer-based models, is a significant advancement from our previous work. The web application enhances its accessibility and utility. By making the model and source code fully open and well-documented, we aim to promote unrestricted use and encourage further development.

摘要

系统地命名化合物是一项复杂的任务,由国际纯粹与应用化学联合会(IUPAC)制定的一套规则所支配。这些规则具有普遍性,被全球化学家广泛接受,但其复杂性使得个人难以始终准确地应用它们。可以采用一种翻译方法来应对这一挑战。将化合物从SMILES符号准确翻译为其相应的IUPAC名称至关重要,因为这可以显著简化命名化学结构的繁琐过程。在此,我们展示了STOUT(SMILES到IUPAC名称翻译器)V2,它通过引入基于Transformer的模型来应对这一挑战,该模型将化学结构的字符串表示转换为IUPAC名称。在一个包含近10亿个SMILES字符串及其相应IUPAC名称的数据集上进行训练后,STOUT V2在生成IUPAC名称方面表现出卓越的准确性,即使对于复杂的化学结构也是如此。该模型捕捉化学结构中复杂模式和关系的能力使其能够生成精确且标准化的IUPAC名称。虽然既定的确定性算法仍然是系统化学命名的黄金标准,但我们通过学术许可访问OpenEye的Lexichem软件所开展的工作,展示了神经方法在补充化学命名现有工具方面的潜力。科学贡献:基于Transformer模型构建的STOUT V2是我们先前工作的重大进展。该网络应用程序提高了其可访问性和实用性。通过使模型和源代码完全开放并提供详细文档,我们旨在促进无限制使用并鼓励进一步开发。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/37c2e0126ecd/13321_2024_941_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/ef032c384b4e/13321_2024_941_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/36de36a2c157/13321_2024_941_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/b3dd694a6bf4/13321_2024_941_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/0064a7297c9d/13321_2024_941_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/f96bf1e64cd8/13321_2024_941_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/e0e20477e7c7/13321_2024_941_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/277d5e51481e/13321_2024_941_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/37c2e0126ecd/13321_2024_941_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/ef032c384b4e/13321_2024_941_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/36de36a2c157/13321_2024_941_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/b3dd694a6bf4/13321_2024_941_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/0064a7297c9d/13321_2024_941_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/f96bf1e64cd8/13321_2024_941_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/e0e20477e7c7/13321_2024_941_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/277d5e51481e/13321_2024_941_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231d/11673719/37c2e0126ecd/13321_2024_941_Fig8_HTML.jpg

相似文献

1
STOUT V2.0: SMILES to IUPAC name conversion using transformer models.STOUT V2.0:使用变压器模型将SMILES转换为IUPAC名称。
J Cheminform. 2024 Dec 27;16(1):146. doi: 10.1186/s13321-024-00941-x.
2
STOUT: SMILES to IUPAC names using neural machine translation.STOUT:使用神经机器翻译将SMILES转换为IUPAC名称。
J Cheminform. 2021 Apr 27;13(1):34. doi: 10.1186/s13321-021-00512-4.
3
Detection of IUPAC and IUPAC-like chemical names.检测国际纯粹与应用化学联合会(IUPAC)及类IUPAC化学名称。
Bioinformatics. 2008 Jul 1;24(13):i268-76. doi: 10.1093/bioinformatics/btn181.
4
Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier.翻译InChI:使神经机器翻译适用于从化学标识符预测IUPAC名称。
J Cheminform. 2021 Oct 7;13(1):79. doi: 10.1186/s13321-021-00535-x.
5
GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES.GlyLES:从IUPAC缩合式到SMILES式的基于语法的聚糖解析
J Cheminform. 2023 Mar 23;15(1):37. doi: 10.1186/s13321-023-00704-0.
6
Positional embeddings and zero-shot learning using BERT for molecular-property prediction.使用BERT进行位置嵌入和零样本学习以预测分子性质
J Cheminform. 2025 Feb 5;17(1):17. doi: 10.1186/s13321-025-00959-9.
7
Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level.迁移学习:基于小规模化学反应数据集的逆向合成预测扩展到新的水平。
Molecules. 2020 May 19;25(10):2357. doi: 10.3390/molecules25102357.
8
Extracting and connecting chemical structures from text sources using chemicalize.org.使用 chemicalize.org 从文本来源中提取和连接化学结构。
J Cheminform. 2013 Apr 23;5(1):20. doi: 10.1186/1758-2946-5-20.
9
Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture.通过增强的DECIMER架构实现手绘化学结构识别的进展。
J Cheminform. 2024 Jul 5;16(1):78. doi: 10.1186/s13321-024-00872-7.
10
New benchmark for chemical nomenclature software.化学命名法软件的新基准。
J Chem Inf Model. 2012 May 25;52(5):1124-31. doi: 10.1021/ci3000419. Epub 2012 Apr 18.

本文引用的文献

1
DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications.DECIMER.ai:一个用于科学出版物中光学化学结构自动识别、分割和识别的开放平台。
Nat Commun. 2023 Aug 19;14(1):5045. doi: 10.1038/s41467-023-40782-0.
2
The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies.人工智能在药物研发中的作用:挑战、机遇与策略。
Pharmaceuticals (Basel). 2023 Jun 18;16(6):891. doi: 10.3390/ph16060891.
3
Chemistry42: An AI-Driven Platform for Molecular Design and Optimization.
Chemistry42:一个人工智能驱动的分子设计和优化平台。
J Chem Inf Model. 2023 Feb 13;63(3):695-701. doi: 10.1021/acs.jcim.2c01191. Epub 2023 Feb 2.
4
PubChem 2023 update.PubChem 2023 更新。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1373-D1380. doi: 10.1093/nar/gkac956.
5
Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier.翻译InChI:使神经机器翻译适用于从化学标识符预测IUPAC名称。
J Cheminform. 2021 Oct 7;13(1):79. doi: 10.1186/s13321-021-00535-x.
6
Transformer-based artificial neural networks for the conversion between chemical notations.基于 Transformer 的人工神经网络在化学标记物转换中的应用。
Sci Rep. 2021 Jul 20;11(1):14798. doi: 10.1038/s41598-021-94082-y.
7
Artificial Intelligence in Chemistry: Current Trends and Future Directions.人工智能在化学领域的应用:当前趋势和未来方向。
J Chem Inf Model. 2021 Jul 26;61(7):3197-3212. doi: 10.1021/acs.jcim.1c00619. Epub 2021 Jul 15.
8
STOUT: SMILES to IUPAC names using neural machine translation.STOUT:使用神经机器翻译将SMILES转换为IUPAC名称。
J Cheminform. 2021 Apr 27;13(1):34. doi: 10.1186/s13321-021-00512-4.
9
The chemfp project.化学指纹项目。
J Cheminform. 2019 Dec 5;11(1):76. doi: 10.1186/s13321-019-0398-8.
10
DECIMER: towards deep learning for chemical image recognition.DECIMER:迈向用于化学图像识别的深度学习
J Cheminform. 2020 Oct 27;12(1):65. doi: 10.1186/s13321-020-00469-w.