Merkys Andrius, Vaitkus Antanas, Grybauskas Algirdas, Konovalovas Aleksandras, Quirós Miguel, Gražulis Saulius
Sector of Crystallography and Chemical Informatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania.
Department of Biochemistry and Molecular Biology, Institute of Biosciences, Life Sciences Center, Vilnius University, Vilnius, Lithuania.
J Cheminform. 2023 Feb 23;15(1):25. doi: 10.1186/s13321-023-00692-1.
Published reports of chemical compounds often contain multiple machine-readable descriptions which may supplement each other in order to yield coherent and complete chemical representations. This publication presents a method to cross-check such descriptions using a canonical representation and isomorphism of molecular graphs. If immediate agreement between compound descriptions is not found, the algorithm derives the minimal set of simplifications required for both descriptions to arrive to a matching form (if any). The proposed algorithm is used to cross-check chemical descriptions from the Crystallography Open Database to identify coherently described entries as well as those requiring further curation.
已发表的化合物报告通常包含多个机器可读描述,这些描述可能相互补充,以生成连贯且完整的化学表示。本出版物提出了一种使用分子图的规范表示和同构来交叉检查此类描述的方法。如果未发现化合物描述之间的直接一致性,该算法将得出两个描述都需要达到匹配形式(如果有的话)所需的最小简化集。所提出的算法用于交叉检查来自晶体学开放数据库的化学描述,以识别描述连贯的条目以及那些需要进一步整理的条目。