National Magnetic Resonance Facility at Madison, Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA.
Sci Data. 2017 May 23;4:170073. doi: 10.1038/sdata.2017.73.
Rigorous characterization of small organic molecules in terms of their structural and biological properties is vital to biomedical research. The three-dimensional structure of a molecule, its 'photo ID', is inefficient for searching and matching tasks. Instead, identifiers play a key role in accessing compound data. Unique and reproducible molecule and atom identifiers are required to ensure the correct cross-referencing of properties associated with compounds archived in databases. The best approach to this requirement is the International Chemical Identifier (InChI). However, the current implementation of InChI fails to provide a complete standard for atom nomenclature, and incorrect use of the InChI standard has resulted in the proliferation of non-unique identifiers. We propose a methodology and associated software tools, named ALATIS, that overcomes these shortcomings. ALATIS is an adaptation of InChI, which operates fully within the InChI convention to provide unique and reproducible molecule and all atom identifiers. ALATIS includes an InChI extension for unique atom labeling of symmetric molecules. ALATIS forms the basis for improving reproducibility and unifying cross-referencing across databases.
严格描述小分子的结构和生物特性对于生物医药研究至关重要。分子的三维结构,即其“照片 ID”,在搜索和匹配任务中效率低下。相反,标识符在访问化合物数据方面起着关键作用。需要唯一且可重复的分子和原子标识符,以确保正确引用数据库中存档化合物的属性。满足此要求的最佳方法是国际化学标识符 (InChI)。然而,当前的 InChI 实现未能为原子命名提供完整的标准,并且对 InChI 标准的不正确使用导致非唯一标识符的扩散。我们提出了一种名为 ALATIS 的方法和相关软件工具,可克服这些缺点。ALATIS 是 InChI 的一种改编,它完全在 InChI 规范内运行,以提供唯一且可重复的分子和所有原子标识符。ALATIS 包括一种用于对称分子的独特原子标记的 InChI 扩展。ALATIS 为提高数据库之间的可重复性和统一交叉引用奠定了基础。