Suppr超能文献

在转向无机物的同时,使国际化合物标识(InChI)变得公平且可持续。

Making the InChI FAIR and sustainable while moving to inorganics.

作者信息

Blanke Gerd, Brammer Jan, Baljozovic Djordje, Khan Nauman Ullah, Lange Frank, Bänsch Felix, Tovee Clare A, Schatzschneider Ulrich, Hartshorn Richard M, Herres-Pawlis Sonja

机构信息

StructurePendium GmbH, Essen, Germany.

Institut für Anorganische Chemie, Landoltweg 1a, 52074 Aachen, Germany.

出版信息

Faraday Discuss. 2025 Jan 14;256(0):503-519. doi: 10.1039/d4fd00145a.

Abstract

The InChI (International Chemical Identifier) standard stands as a cornerstone in chemical informatics, facilitating the structure-based identification and exchange chemical information about compounds across various platforms and databases. The InChI as a unique canonical line notation has made chemical structures searchable on the internet at a broad scale. The largest repositories working with InChIs contain more than 1 billion structures. Central to the functionality of the InChI is its codebase, which orchestrates a series of intricate steps to generate unique identifiers for chemical compounds. Up to now, these steps have been sparsely documented and the InChI algorithm had to be seen as a black box. For the new v1.07 release, the code has been analyzed and the major steps documented, more than 3000 bugs and security issues, as well as nearly 60 Google OSS-Fuzz issues have been fixed. New test systems have been implemented that allow users to directly test the code developments. The move to GitHub has not only made the development more transparent but will also enable external contributors to join the further development of the InChI code. Motivation for this modernisation was the urgency to treat molecular inorganic compounds by the InChI in a meaningful way. Until now, no classic string representation fulfills this need of molecular inorganic chemistry. Currently bonds to metal centers are by definition disconnected which makes most inorganic InChIs meaningless at the moment. Herein, we propose new routines to remedy this problem in the representation of molecular inorganic compounds by the InChI.

摘要

国际化学标识符(InChI)标准是化学信息学的基石,有助于在各种平台和数据库之间基于结构识别和交换化合物的化学信息。InChI作为一种独特的规范线性表示法,使得化学结构能够在互联网上大规模进行搜索。使用InChI的最大数据库包含超过10亿个结构。InChI功能的核心是其代码库,它精心编排了一系列复杂步骤来生成化合物的唯一标识符。到目前为止,这些步骤的记录很少,InChI算法一直被视为一个黑箱。对于新的v1.07版本,已经对代码进行了分析并记录了主要步骤,修复了3000多个错误和安全问题,以及近60个谷歌OSS-Fuzz问题。已经实施了新的测试系统,允许用户直接测试代码开发情况。迁移到GitHub不仅使开发更加透明,还将使外部贡献者能够参与InChI代码的进一步开发。进行这种现代化改进的动机是迫切需要以有意义的方式用InChI处理分子无机化合物。到目前为止,没有经典的字符串表示法能够满足分子无机化学的这一需求。目前,根据定义,与金属中心的键是断开的,这使得目前大多数无机InChI毫无意义。在此,我们提出新的例程来解决InChI在分子无机化合物表示方面的这一问题。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验