Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, United States.
Beth Israel Deaconess Medical Center, Boston, MA 02215, United States.
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad130.
Biomedical identifier resources (such as ontologies, taxonomies, and controlled vocabularies) commonly overlap in scope and contain equivalent entries under different identifiers. Maintaining mappings between these entries is crucial for interoperability and the integration of data and knowledge. However, there are substantial gaps in available mappings motivating their semi-automated curation.
Biomappings implements a curation workflow for missing mappings which combines automated prediction with human-in-the-loop curation. It supports multiple prediction approaches and provides a web-based user interface for reviewing predicted mappings for correctness, combined with automated consistency checking. Predicted and curated mappings are made available in public, version-controlled resource files on GitHub. Biomappings currently makes available 9274 curated mappings and 40 691 predicted ones, providing previously missing mappings between widely used identifier resources covering small molecules, cell lines, diseases, and other concepts. We demonstrate the value of Biomappings on case studies involving predicting and curating missing mappings among cancer cell lines as well as small molecules tested in clinical trials. We also present how previously missing mappings curated using Biomappings were contributed back to multiple widely used community ontologies.
The data and code are available under the CC0 and MIT licenses at https://github.com/biopragmatics/biomappings.
生物医学标识符资源(如本体、分类法和受控词汇表)在范围上通常重叠,并在不同的标识符下包含等效条目。维护这些条目的映射对于互操作性以及数据和知识的集成至关重要。然而,可用映射中存在大量差距,这促使我们对其进行半自动策展。
Biomappings 实现了缺失映射的策展工作流程,该流程将自动化预测与人工参与的策展相结合。它支持多种预测方法,并提供了一个基于网络的用户界面,用于检查预测映射的正确性,同时结合自动化一致性检查。预测和策展的映射可在 GitHub 上的公共、版本控制的资源文件中获得。Biomappings 目前提供了 9274 条已策展的映射和 40691 条预测映射,提供了在广泛使用的标识符资源之间以前缺失的映射,这些资源涵盖小分子、细胞系、疾病和其他概念。我们通过癌症细胞系以及临床试验中测试的小分子之间缺失映射的预测和策展案例研究展示了 Biomappings 的价值。我们还介绍了使用 Biomappings 策展的以前缺失的映射如何被回馈到多个广泛使用的社区本体中。
数据和代码可在 CC0 和 MIT 许可证下在 https://github.com/biopragmatics/biomappings 获得。