Suppr超能文献

基于链接开放数据的自动生物医学本体生成框架。

Linked open data-based framework for automatic biomedical ontology generation.

机构信息

Computer Science and Engineering Department, Oakland University, 2200 N. Squirrel Rd, Rochester, MI, 48309, USA.

Micro Focus International plc, Troy, MI, 48084, USA.

出版信息

BMC Bioinformatics. 2018 Sep 10;19(1):319. doi: 10.1186/s12859-018-2339-3.

Abstract

BACKGROUND

Fulfilling the vision of Semantic Web requires an accurate data model for organizing knowledge and sharing common understanding of the domain. Fitting this description, ontologies are the cornerstones of Semantic Web and can be used to solve many problems of clinical information and biomedical engineering, such as word sense disambiguation, semantic similarity, question answering, ontology alignment, etc. Manual construction of ontology is labor intensive and requires domain experts and ontology engineers. To downsize the labor-intensive nature of ontology generation and minimize the need for domain experts, we present a novel automated ontology generation framework, Linked Open Data approach for Automatic Biomedical Ontology Generation (LOD-ABOG), which is empowered by Linked Open Data (LOD). LOD-ABOG performs concept extraction using knowledge base mainly UMLS and LOD, along with Natural Language Processing (NLP) operations; and applies relation extraction using LOD, Breadth first Search (BSF) graph method, and Freepal repository patterns.

RESULTS

Our evaluation shows improved results in most of the tasks of ontology generation compared to those obtained by existing frameworks. We evaluated the performance of individual tasks (modules) of proposed framework using CDR and SemMedDB datasets. For concept extraction, evaluation shows an average F-measure of 58.12% for CDR corpus and 81.68% for SemMedDB; F-measure of 65.26% and 77.44% for biomedical taxonomic relation extraction using datasets of CDR and SemMedDB, respectively; and F-measure of 52.78% and 58.12% for biomedical non-taxonomic relation extraction using CDR corpus and SemMedDB, respectively. Additionally, the comparison with manually constructed baseline Alzheimer ontology shows F-measure of 72.48% in terms of concepts detection, 76.27% in relation extraction, and 83.28% in property extraction. Also, we compared our proposed framework with ontology-learning framework called "OntoGain" which shows that LOD-ABOG performs 14.76% better in terms of relation extraction.

CONCLUSION

This paper has presented LOD-ABOG framework which shows that current LOD sources and technologies are a promising solution to automate the process of biomedical ontology generation and extract relations to a greater extent. In addition, unlike existing frameworks which require domain experts in ontology development process, the proposed approach requires involvement of them only for improvement purpose at the end of ontology life cycle.

摘要

背景

实现语义网的愿景需要一个准确的知识组织数据模型,并共享对领域的共同理解。 ontology 正是语义网的基石,它可以用于解决临床信息和生物医学工程中的许多问题,例如词义消歧、语义相似性、问答、本体对齐等。手动构建本体是一项劳动密集型工作,需要领域专家和本体工程师。为了减少本体生成的劳动密集型性质,并最大限度地减少对领域专家的需求,我们提出了一种新颖的自动化本体生成框架,即基于链接开放数据的自动生物医学本体生成(LOD-ABOG)方法,该方法由链接开放数据(LOD)提供支持。LOD-ABOG 使用知识库(主要是 UMLS 和 LOD)进行概念提取,并结合自然语言处理(NLP)操作;使用 LOD、广度优先搜索(BSF)图方法和 Freepal 存储库模式进行关系提取。

结果

与现有框架相比,我们的评估显示在大多数本体生成任务中都取得了更好的结果。我们使用 CDR 和 SemMedDB 数据集评估了所提出框架的各个任务(模块)的性能。对于概念提取,CDR 语料库的平均 F1 得分为 58.12%,SemMedDB 的平均 F1 得分为 81.68%;使用 CDR 和 SemMedDB 的数据集进行生物医学分类关系提取的 F1 得分为 65.26%和 77.44%;使用 CDR 语料库和 SemMedDB 进行生物医学非分类关系提取的 F1 得分为 52.78%和 58.12%。此外,与手动构建的阿尔茨海默病本体基线进行比较,在概念检测方面的 F1 得分为 72.48%,在关系提取方面的 F1 得分为 76.27%,在属性提取方面的 F1 得分为 83.28%。此外,我们还将我们提出的框架与称为“OntoGain”的本体学习框架进行了比较,结果表明 LOD-ABOG 在关系提取方面的性能提高了 14.76%。

结论

本文提出了 LOD-ABOG 框架,表明当前的 LOD 源和技术是自动化生物医学本体生成过程并在更大程度上提取关系的有前途的解决方案。此外,与需要领域专家参与本体开发过程的现有框架不同,该方法仅在本体生命周期结束时需要他们的参与,以进行改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eff3/6131949/6cb85ae69f0b/12859_2018_2339_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验