Müller Simon
Institute of Thermal Separation Processes, Hamburg University of Technology, Eißendorfer Straße 38, 21073, Hamburg, Germany.
J Cheminform. 2025 Aug 4;17(1):117. doi: 10.1186/s13321-025-01064-7.
Accurate chemical structure resolution from textual identifiers such as names and CAS RN® is critical for computational modeling in chemistry and related fields. This paper introduces MoleculeResolver, an automated, robust Python-based tool designed to address inconsistencies and inaccuracies commonly encountered when converting chemical identifiers to canonical SMILES strings. MoleculeResolver systematically crosschecks structures retrieved from multiple reputable chemical databases, implements rigorous identifier plausibility checks, standardizes molecular structures, and intelligently selects the most accurate representation based on a unique resolution algorithm. SCIENTIFIC CONTRIBUTION: Benchmarks across diverse datasets confirm that MoleculeResolver significantly enhances precision, recall, and overall reliability compared to traditional single-source methods, proving its utility as a valuable resource for chemists, data scientists, and researchers engaged in high-quality molecular data analysis and predictive model development.
从名称和CAS RN®等文本标识符中准确解析化学结构,对于化学及相关领域的计算建模至关重要。本文介绍了MoleculeResolver,这是一个基于Python的自动化、强大的工具,旨在解决将化学标识符转换为标准SMILES字符串时常见的不一致性和不准确问题。MoleculeResolver系统地交叉检查从多个知名化学数据库检索到的结构,实施严格的标识符合理性检查,标准化分子结构,并基于独特的解析算法智能地选择最准确的表示形式。科学贡献:跨不同数据集的基准测试证实,与传统的单源方法相比,MoleculeResolver显著提高了精度、召回率和整体可靠性,证明了它作为化学家、数据科学家以及从事高质量分子数据分析和预测模型开发的研究人员的宝贵资源的效用。