Suppr超能文献

一种将维基数据中的生物医学知识与开放生物和生物医学本体以及医学主题词表关键词相整合的框架。

A framework for integrating biomedical knowledge in Wikidata with open biological and biomedical ontologies and MeSH keywords.

作者信息

Turki Houcemeddine, Chebil Khalil, Dossou Bonaventure F P, Emezue Chris Chinenye, Owodunni Abraham Toluwase, Hadj Taieb Mohamed Ali, Ben Aouicha Mohamed

机构信息

Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia.

SisonkeBiotik Research Community, Johannesburg, South Africa.

出版信息

Heliyon. 2024 Sep 27;10(19):e38448. doi: 10.1016/j.heliyon.2024.e38448. eCollection 2024 Oct 15.

Abstract

This study presents a comprehensive framework to enhance Wikidata as an open and collaborative knowledge graph by integrating Open Biological and Biomedical Ontologies (OBO) and Medical Subject Headings (MeSH) keywords from PubMed publications. The primary data sources include OBO ontologies and MeSH keywords, which were collected and classified using SPARQL queries for RDF knowledge graphs. The semantic alignment between OBO ontologies and Wikidata was evaluated, revealing significant gaps and distorted representations that necessitate both automated and manual interventions for improvement. We employed pointwise mutual information to extract biomedical relations among the 5000 most common MeSH keywords in PubMed, achieving an accuracy of 89.40 % for superclass-based classification and 75.32 % for relation type-based classification. Additionally, Integrated Gradients were utilized to refine the classification by removing irrelevant MeSH qualifiers, enhancing overall efficiency. The framework also explored the use of MeSH keywords to identify PubMed reviews supporting unsupported Wikidata relations, finding that 45.8 % of these relations were not present in PubMed, indicating potential inconsistencies in Wikidata. The contributions of this study include improved methodologies for enriching Wikidata with biomedical information, validated semantic alignments, and efficient classification processes. This work enhances the interoperability and multilingual capabilities of biomedical ontologies and demonstrates the critical role of MeSH keywords in verifying semantic relations, thereby contributing to the robustness and accuracy of collaborative biomedical knowledge graphs.

摘要

本研究提出了一个全面的框架,通过整合来自PubMed出版物的开放生物医学本体(OBO)和医学主题词表(MeSH)关键词,将维基数据增强为一个开放的协作知识图谱。主要数据源包括OBO本体和MeSH关键词,它们是使用针对RDF知识图谱的SPARQL查询收集和分类的。评估了OBO本体与维基数据之间的语义对齐情况,发现存在显著差距和扭曲的表示,这需要自动和手动干预来改进。我们使用逐点互信息来提取PubMed中5000个最常见MeSH关键词之间的生物医学关系,基于超类的分类准确率达到89.40%,基于关系类型的分类准确率达到75.32%。此外,利用集成梯度通过去除不相关的MeSH限定词来优化分类,提高整体效率。该框架还探索了使用MeSH关键词来识别支持维基数据中无支撑关系的PubMed综述,发现这些关系中有45.8%在PubMed中不存在,这表明维基数据中可能存在不一致性。本研究的贡献包括用于用生物医学信息丰富维基数据的改进方法、经过验证的语义对齐以及高效的分类过程。这项工作增强了生物医学本体的互操作性和多语言能力,并展示了MeSH关键词在验证语义关系中的关键作用,从而有助于协作生物医学知识图谱的稳健性和准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3cf/11471508/7fd028dac07d/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验