小分子数据库内部及之间非系统化学标识符的模糊性。

Ambiguity of non-systematic chemical identifiers within and between small-molecule databases.

作者信息

Akhondi Saber A, Muresan Sorel, Williams Antony J, Kors Jan A

机构信息

Department of Medical Informatics, Erasmus University Medical Centre, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands.

Food Control Department, Banat University of Agricultural Sciences and Veterinary Medicine, Calea Aradului 119, 300645 Timisoara, Romania.

出版信息

J Cheminform. 2015 Nov 16;7:54. doi: 10.1186/s13321-015-0102-6. eCollection 2015.

DOI:10.1186/s13321-015-0102-6

PMID:26579214

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4646925/

Abstract

BACKGROUND

A wide range of chemical compound databases are currently available for pharmaceutical research. To retrieve compound information, including structures, researchers can query these chemical databases using non-systematic identifiers. These are source-dependent identifiers (e.g., brand names, generic names), which are usually assigned to the compound at the point of registration. The correctness of non-systematic identifiers (i.e., whether an identifier matches the associated structure) can only be assessed manually, which is cumbersome, but it is possible to automatically check their ambiguity (i.e., whether an identifier matches more than one structure). In this study we have quantified the ambiguity of non-systematic identifiers within and between eight widely used chemical databases. We also studied the effect of chemical structure standardization on reducing the ambiguity of non-systematic identifiers.

RESULTS

The ambiguity of non-systematic identifiers within databases varied from 0.1 to 15.2 % (median 2.5 %). Standardization reduced the ambiguity only to a small extent for most databases. A wide range of ambiguity existed for non-systematic identifiers that are shared between databases (17.7-60.2 %, median of 40.3 %). Removing stereochemistry information provided the largest reduction in ambiguity across databases (median reduction 13.7 percentage points).

CONCLUSIONS

Ambiguity of non-systematic identifiers within chemical databases is generally low, but ambiguity of non-systematic identifiers that are shared between databases, is high. Chemical structure standardization reduces the ambiguity to a limited extent. Our findings can help to improve database integration, curation, and maintenance.

摘要

背景

目前有各种各样的化合物数据库可用于药物研究。为了检索包括结构在内的化合物信息，研究人员可以使用非系统标识符查询这些化学数据库。这些是非系统依赖标识符（例如，品牌名、通用名），通常在注册时分配给化合物。非系统标识符的正确性（即一个标识符是否与相关结构匹配）只能手动评估，这很繁琐，但可以自动检查它们的歧义性（即一个标识符是否与多个结构匹配）。在本研究中，我们对八个广泛使用的化学数据库内部和之间的非系统标识符的歧义性进行了量化。我们还研究了化学结构标准化对减少非系统标识符歧义性的影响。

结果

数据库内非系统标识符的歧义性在0.1%至15.2%之间（中位数为2.5%）。对于大多数数据库，标准化仅在很小程度上降低了歧义性。数据库之间共享的非系统标识符存在广泛的歧义性（17.7% - 60.2%，中位数为40.3%）。去除立体化学信息在所有数据库中导致的歧义性降低最大（中位数降低13.7个百分点）。

结论

化学数据库中非系统标识符的歧义性通常较低，但数据库之间共享的非系统标识符的歧义性较高。化学结构标准化在有限程度上降低了歧义性。我们的研究结果有助于改进数据库集成、管理和维护。

相似文献

Ambiguity of non-systematic chemical identifiers within and between small-molecule databases.

J Cheminform. 2015 Nov 16;7:54. doi: 10.1186/s13321-015-0102-6. eCollection 2015.

Consistency of systematic chemical identifiers within and between small-molecule databases.

J Cheminform. 2012 Dec 13;4(1):35. doi: 10.1186/1758-2946-4-35.

Consistency, Inconsistency, and Ambiguity of Metabolite Names in Biochemical Databases Used for Genome-Scale Metabolic Modelling.

Metabolites. 2019 Feb 6;9(2):28. doi: 10.3390/metabo9020028.

Biodiversity informatics: the challenge of linking data and the role of shared identifiers.

Brief Bioinform. 2008 Sep;9(5):345-54. doi: 10.1093/bib/bbn022. Epub 2008 Apr 29.

Toward the Reconciliation of Inconsistent Molecular Structures from Biochemical Databases.

J Comput Biol. 2024 Jun;31(6):498-512. doi: 10.1089/cmb.2024.0520. Epub 2024 May 17.

Atom Identifiers Generated by a Neighborhood-Specific Graph Coloring Method Enable Compound Harmonization across Metabolic Databases.

Metabolites. 2020 Sep 11;10(9):368. doi: 10.3390/metabo10090368.

The Chemical Translation Service--a web-based tool to improve standardization of metabolomic reports.

Bioinformatics. 2010 Oct 15;26(20):2647-8. doi: 10.1093/bioinformatics/btq476. Epub 2010 Sep 9.

CART-a chemical annotation retrieval toolkit.

Bioinformatics. 2016 Sep 15;32(18):2869-71. doi: 10.1093/bioinformatics/btw233. Epub 2016 Jun 2.

Gene name ambiguity of eukaryotic nomenclatures.

Bioinformatics. 2005 Jan 15;21(2):248-56. doi: 10.1093/bioinformatics/bth496. Epub 2004 Aug 27.

Synthetic cannabinoid receptor agonists: classification and nomenclature.

Clin Toxicol (Phila). 2020 Feb;58(2):82-98. doi: 10.1080/15563650.2019.1661425. Epub 2019 Sep 16.

引用本文的文献

How to crack a SMILES: automatic crosschecked chemical structure resolution across multiple services using MoleculeResolver.

J Cheminform. 2025 Aug 4;17(1):117. doi: 10.1186/s13321-025-01064-7.

A Metabolites Merging Strategy (MMS): Harmonization to Enable Studies' Intercomparison.

Metabolites. 2023 Nov 21;13(12):1167. doi: 10.3390/metabo13121167.

The heterogeneous pharmacological medical biochemical network PharMeBINet.

Sci Data. 2022 Jul 11;9(1):393. doi: 10.1038/s41597-022-01510-3.

Novel Opioids: Systematic Web Crawling Within the e-Psychonauts' Scenario.

Front Neurosci. 2020 Mar 18;14:149. doi: 10.3389/fnins.2020.00149. eCollection 2020.

Consistency, Inconsistency, and Ambiguity of Metabolite Names in Biochemical Databases Used for Genome-Scale Metabolic Modelling.

Metabolites. 2019 Feb 6;9(2):28. doi: 10.3390/metabo9020028.

Automatic identification of relevant chemical compounds from patents.

Database (Oxford). 2019 Jan 1;2019:baz001. doi: 10.1093/database/baz001.

Challenges of Connecting Chemistry to Pharmacology: Perspectives from Curating the IUPHAR/BPS Guide to PHARMACOLOGY.

ACS Omega. 2018 Jul 31;3(7):8408-8420. doi: 10.1021/acsomega.8b00884.

Chemical entity recognition in patents by combining dictionary-based and statistical approaches.

Database (Oxford). 2016 May 2;2016. doi: 10.1093/database/baw061. Print 2016.

本文引用的文献

Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications.

Mol Inform. 2011 Jun;30(6-7):506-19. doi: 10.1002/minf.201100005. Epub 2011 Jul 12.

CHEMDNER: The drugs and chemical names extraction challenge.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1. doi: 10.1186/1758-2946-7-S1-S1. eCollection 2015.

Annotated chemical patent corpus: a gold standard for text mining.

PLoS One. 2014 Sep 30;9(9):e107477. doi: 10.1371/journal.pone.0107477. eCollection 2014.

Chemical named entities recognition: a review on approaches and applications.

J Cheminform. 2014 Apr 28;6:17. doi: 10.1186/1758-2946-6-17. eCollection 2014.

On InChI and evaluating the quality of cross-reference links.

J Cheminform. 2014 Apr 17;6:15. doi: 10.1186/1758-2946-6-15. eCollection 2014.

Comparing the Chemical Structure and Protein Content of ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database.

Mol Inform. 2013 Dec;32(11-12):881-897. doi: 10.1002/minf.201300103. Epub 2013 Dec 11.

Comparative evaluation of open source software for mapping between metabolite identifiers in metabolic network reconstructions: application to Recon 2.

J Cheminform. 2014 Jan 27;6(1):2. doi: 10.1186/1758-2946-6-2.

Chemical predictive modelling to improve compound quality.

Nat Rev Drug Discov. 2013 Dec;12(12):948-62. doi: 10.1038/nrd4128.

DrugBank 4.0: shedding new light on drug metabolism.

Nucleic Acids Res. 2014 Jan;42(Database issue):D1091-7. doi: 10.1093/nar/gkt1068. Epub 2013 Nov 6.

InChI - the worldwide chemical structure identifier standard.

J Cheminform. 2013 Jan 24;5(1):7. doi: 10.1186/1758-2946-5-7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

小分子数据库内部及之间非系统化学标识符的模糊性。

Ambiguity of non-systematic chemical identifiers within and between small-molecule databases.

作者信息

Akhondi Saber A, Muresan Sorel, Williams Antony J, Kors Jan A

机构信息

Department of Medical Informatics, Erasmus University Medical Centre, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands.

Food Control Department, Banat University of Agricultural Sciences and Veterinary Medicine, Calea Aradului 119, 300645 Timisoara, Romania.

出版信息

J Cheminform. 2015 Nov 16;7:54. doi: 10.1186/s13321-015-0102-6. eCollection 2015.

DOI:10.1186/s13321-015-0102-6

PMID:26579214

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4646925/

Abstract

BACKGROUND

RESULTS

CONCLUSIONS

摘要

小分子数据库内部及之间非系统化学标识符的模糊性。

Ambiguity of non-systematic chemical identifiers within and between small-molecule databases.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

小分子数据库内部及之间非系统化学标识符的模糊性。

Ambiguity of non-systematic chemical identifiers within and between small-molecule databases.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论