• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

如何破解SMILES:使用分子解析器跨多个服务自动交叉核对化学结构解析

How to crack a SMILES: automatic crosschecked chemical structure resolution across multiple services using MoleculeResolver.

作者信息

Müller Simon

机构信息

Institute of Thermal Separation Processes, Hamburg University of Technology, Eißendorfer Straße 38, 21073, Hamburg, Germany.

出版信息

J Cheminform. 2025 Aug 4;17(1):117. doi: 10.1186/s13321-025-01064-7.

DOI:10.1186/s13321-025-01064-7
PMID:40760698
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12323220/
Abstract

Accurate chemical structure resolution from textual identifiers such as names and CAS RN® is critical for computational modeling in chemistry and related fields. This paper introduces MoleculeResolver, an automated, robust Python-based tool designed to address inconsistencies and inaccuracies commonly encountered when converting chemical identifiers to canonical SMILES strings. MoleculeResolver systematically crosschecks structures retrieved from multiple reputable chemical databases, implements rigorous identifier plausibility checks, standardizes molecular structures, and intelligently selects the most accurate representation based on a unique resolution algorithm. SCIENTIFIC CONTRIBUTION: Benchmarks across diverse datasets confirm that MoleculeResolver significantly enhances precision, recall, and overall reliability compared to traditional single-source methods, proving its utility as a valuable resource for chemists, data scientists, and researchers engaged in high-quality molecular data analysis and predictive model development.

摘要

从名称和CAS RN®等文本标识符中准确解析化学结构,对于化学及相关领域的计算建模至关重要。本文介绍了MoleculeResolver,这是一个基于Python的自动化、强大的工具,旨在解决将化学标识符转换为标准SMILES字符串时常见的不一致性和不准确问题。MoleculeResolver系统地交叉检查从多个知名化学数据库检索到的结构,实施严格的标识符合理性检查,标准化分子结构,并基于独特的解析算法智能地选择最准确的表示形式。科学贡献:跨不同数据集的基准测试证实,与传统的单源方法相比,MoleculeResolver显著提高了精度、召回率和整体可靠性,证明了它作为化学家、数据科学家以及从事高质量分子数据分析和预测模型开发的研究人员的宝贵资源的效用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/bb7245ae5d98/13321_2025_1064_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/482532e61711/13321_2025_1064_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/2a0519e54850/13321_2025_1064_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/aabd9de3dc80/13321_2025_1064_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/3b4b1866573d/13321_2025_1064_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/99fdb9283d8b/13321_2025_1064_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/463d7bdd2583/13321_2025_1064_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/bb7245ae5d98/13321_2025_1064_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/482532e61711/13321_2025_1064_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/2a0519e54850/13321_2025_1064_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/aabd9de3dc80/13321_2025_1064_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/3b4b1866573d/13321_2025_1064_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/99fdb9283d8b/13321_2025_1064_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/463d7bdd2583/13321_2025_1064_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94bf/12323220/bb7245ae5d98/13321_2025_1064_Fig6_HTML.jpg

相似文献

1
How to crack a SMILES: automatic crosschecked chemical structure resolution across multiple services using MoleculeResolver.如何破解SMILES:使用分子解析器跨多个服务自动交叉核对化学结构解析
J Cheminform. 2025 Aug 4;17(1):117. doi: 10.1186/s13321-025-01064-7.
2
Short-Term Memory Impairment短期记忆障碍
3
High-Accuracy Polymer Property Detection via Pareto-Optimized SMILES-Based Deep Learning.通过帕累托优化的基于SMILES的深度学习实现高精度聚合物性能检测。
Polymers (Basel). 2025 Jun 28;17(13):1801. doi: 10.3390/polym17131801.
4
Hail Lifestyle Medicine consensus position statement as a medical specialty: Middle Eastern perspective.欢呼将生活方式医学作为一门医学专业的共识立场声明:中东视角。
Front Public Health. 2025 Jun 20;13:1455871. doi: 10.3389/fpubh.2025.1455871. eCollection 2025.
5
Pain Assessment疼痛评估
6
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.
7
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
8
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
9
Sexual Harassment and Prevention Training性骚扰与预防培训
10
The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.成年自闭症患者的就业生活经历:系统检索与综述
Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.

本文引用的文献

1
Toward the Reconciliation of Inconsistent Molecular Structures from Biochemical Databases.从生化数据库中调和不一致的分子结构。
J Comput Biol. 2024 Jun;31(6):498-512. doi: 10.1089/cmb.2024.0520. Epub 2024 May 17.
2
Free and open-source QSAR-ready workflow for automated standardization of chemical structures in support of QSAR modeling.用于化学结构自动标准化以支持定量构效关系建模的免费开源且适用于定量构效关系的工作流程。
J Cheminform. 2024 Feb 20;16(1):19. doi: 10.1186/s13321-024-00814-3.
3
Chemprop: A Machine Learning Package for Chemical Property Prediction.
Chemprop:一个用于化学性质预测的机器学习工具包。
J Chem Inf Model. 2024 Jan 8;64(1):9-17. doi: 10.1021/acs.jcim.3c01250. Epub 2023 Dec 26.
4
Open-Source Machine Learning in Computational Chemistry.开源机器学习在计算化学中的应用。
J Chem Inf Model. 2023 Aug 14;63(15):4505-4532. doi: 10.1021/acs.jcim.3c00643. Epub 2023 Jul 19.
5
Characterizing Uncertainty in Machine Learning for Chemistry.机器学习在化学中的不确定性描述。
J Chem Inf Model. 2023 Jul 10;63(13):4012-4029. doi: 10.1021/acs.jcim.3c00373. Epub 2023 Jun 20.
6
canSAR chemistry registration and standardization pipeline.癌症小分子活性数据库化学登记与标准化流程
J Cheminform. 2022 May 28;14(1):28. doi: 10.1186/s13321-022-00606-7.
7
CAS Common Chemistry in 2021: Expanding Access to Trusted Chemical Information for the Scientific Community.2021 年 CAS 公共化学:为科学界扩大可信化学信息的获取途径。
J Chem Inf Model. 2022 Jun 13;62(11):2737-2743. doi: 10.1021/acs.jcim.2c00268. Epub 2022 May 13.
8
Curated Data In - Trustworthy Models Out: The Impact of Data Quality on the Reliability of Artificial Intelligence Models as Alternatives to Animal Testing.精选数据入-可信模型出:数据质量对人工智能模型作为动物替代试验替代品的可靠性的影响。
Altern Lab Anim. 2021 May;49(3):73-82. doi: 10.1177/02611929211029635. Epub 2021 Jul 7.
9
Enabling High-Throughput Searches for Multiple Chemical Data Using the U.S.-EPA CompTox Chemicals Dashboard.利用美国环保署 CompTox 化学品数据监测平台实现多种化学物质数据的高通量搜索。
J Chem Inf Model. 2021 Feb 22;61(2):565-570. doi: 10.1021/acs.jcim.0c01273. Epub 2021 Jan 22.
10
An open source chemical structure curation pipeline using RDKit.一个使用RDKit的开源化学结构编目流程。
J Cheminform. 2020 Sep 1;12(1):51. doi: 10.1186/s13321-020-00456-1.