用于半自动本体管理（SEAM）系统的自动化概念与关系提取

Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system.

作者信息

Doing-Harris Kristina, Livnat Yarden, Meystre Stephane

机构信息

University of Utah, Department of Biomedical Informatics, 421 Wakara Way, Suite 140, Salt Lake City, UT 84112 USA.

Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT USA.

出版信息

J Biomed Semantics. 2015 Apr 2;6:15. doi: 10.1186/s13326-015-0011-7. eCollection 2015.

DOI:10.1186/s13326-015-0011-7

PMID:25874077

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4396714/

Abstract

BACKGROUND

We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and relationship extraction techniques in a low overhead modifiable system. Our SEmi-Automated ontology Maintenance (SEAM) system features a natural language processing pipeline for information extraction. Synonym and hierarchical groups are identified using corpus-based semantics and lexico-syntactic patterns. The semantic vectors we use are term frequency by inverse document frequency and context vectors. Clinical documents contain the terms we want in an ontology. They also contain idiosyncratic usage and are unlikely to contain the linguistic constructs associated with synonym and hierarchy identification. By including both clinical and biomedical texts, SEAM can recommend terms from those appearing in both document types. The set of recommended terms is then used to filter the synonyms and hierarchical relationships extracted from the biomedical corpus. We demonstrate the generality of the system across three use cases: ontologies for acute changes in mental status, Medically Unexplained Syndromes, and echocardiogram summary statements.

RESULTS

Across the three uses cases, we held the number of recommended terms relatively constant by changing SEAM's parameters. Experts seem to find more than 300 recommended terms to be overwhelming. The approval rate of recommended terms increased as the number and specificity of clinical documents in the corpus increased. It was 60% when there were 199 clinical documents that were not specific to the ontology domain and 90% when there were 2879 documents very specific to the target domain. We found that fewer than 100 recommended synonym groups were also preferred. Approval rates for synonym recommendations remained low varying from 43% to 25% as the number of journal articles increased from 19 to 47. Overall the number of recommended hierarchical relationships was very low although approval was good. It varied between 67% and 31%.

CONCLUSION

SEAM produced a concise list of recommended clinical terms, synonyms and hierarchical relationships regardless of medical domain.

摘要

背景

我们开发特定医学专业的本体，其中包含既定科学和通用术语用法。我们利用信息和关系提取方面的当前实践来简化本体开发过程。我们的系统在一个低开销可修改系统中结合了不同文本类型以及信息和关系提取技术。我们的半自动本体维护（SEAM）系统具有用于信息提取的自然语言处理管道。使用基于语料库的语义和词汇句法模式来识别同义词和层次组。我们使用的语义向量是词频逆文档频率和上下文向量。临床文档包含我们希望在本体中出现的术语。它们还包含特殊用法，并且不太可能包含与同义词和层次识别相关的语言结构。通过纳入临床和生物医学文本，SEAM可以从两种文档类型中都出现的术语中进行推荐。然后，推荐术语集用于筛选从生物医学语料库中提取的同义词和层次关系。我们通过三个用例展示了该系统的通用性：精神状态急性变化的本体、医学无法解释的综合征以及超声心动图总结陈述。

结果

在这三个用例中，我们通过更改SEAM的参数使推荐术语的数量相对保持恒定。专家似乎认为超过300个推荐术语过多。随着语料库中临床文档的数量和特异性增加，推荐术语的批准率也随之提高。当有199份并非特定于本体领域的临床文档时，批准率为60%；当有2879份非常特定于目标领域的文档时，批准率为90%。我们还发现，少于100个推荐同义词组也更受青睐。随着期刊文章数量从19篇增加到47篇，同义词推荐的批准率仍然很低，从43%到25%不等。总体而言，尽管批准情况良好，但推荐的层次关系数量非常少。其变化范围在67%至31%之间。

结论

无论医学领域如何，SEAM都能生成一份简洁的推荐临床术语、同义词和层次关系列表。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4c9/4396714/f1db7c17cece/13326_2015_11_Fig1_HTML.jpg

相似文献

Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system.用于半自动本体管理（SEAM）系统的自动化概念与关系提取

J Biomed Semantics. 2015 Apr 2;6:15. doi: 10.1186/s13326-015-0011-7. eCollection 2015.

A knowledge-driven approach to biomedical document conceptualization.基于知识的生物医学文献概念化方法。

Artif Intell Med. 2010 Jun;49(2):67-78. doi: 10.1016/j.artmed.2010.02.005. Epub 2010 Apr 3.

A new synonym-substitution method to enrich the human phenotype ontology.一种丰富人类表型本体的新同义词替换方法。

BMC Bioinformatics. 2017 Oct 10;18(1):446. doi: 10.1186/s12859-017-1858-7.

The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text.自然语言处理中领域知识与语言结构的相互作用：解读生物医学文本中的上位命题

J Biomed Inform. 2003 Dec;36(6):462-77. doi: 10.1016/j.jbi.2003.11.003.

Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition.基因本体同义词生成规则可提高生物医学概念识别的性能。

J Biomed Semantics. 2016 Sep 9;7(1):52. doi: 10.1186/s13326-016-0096-7.

KneeTex: an ontology-driven system for information extraction from MRI reports.KneeTex：一个用于从MRI报告中提取信息的本体驱动系统。

J Biomed Semantics. 2015 Sep 7;6:34. doi: 10.1186/s13326-015-0033-1. eCollection 2015.

Reuse of termino-ontological resources and text corpora for building a multilingual domain ontology: an application to Alzheimer's disease.术语本体资源和文本语料库的再利用构建多语言领域本体：在阿尔茨海默病中的应用。

J Biomed Inform. 2014 Apr;48:171-82. doi: 10.1016/j.jbi.2013.12.013. Epub 2013 Dec 29.

Development of ICD-10-TM ontology for a semi-automated morbidity coding system in Thailand.泰国半自动发病率编码系统的ICD-10-TM本体开发。

Methods Inf Med. 2012;51(6):519-28. doi: 10.3414/ME11-02-0024. Epub 2012 Aug 31.

A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora.面向文本语料概念化的概念驱动生物医学知识提取和可视化框架。

J Biomed Inform. 2010 Dec;43(6):1020-35. doi: 10.1016/j.jbi.2010.09.008. Epub 2010 Sep 24.

Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier.用于频率、确定性、程度和覆盖表型修饰符的修饰符本体。

Biodivers Data J. 2018 Nov 28(6):e29232. doi: 10.3897/BDJ.6.e29232. eCollection 2018.

引用本文的文献

Natural Language Processing for Breast Imaging: A Systematic Review.用于乳腺成像的自然语言处理：一项系统综述。

Diagnostics (Basel). 2023 Apr 14;13(8):1420. doi: 10.3390/diagnostics13081420.

Design considerations for a hierarchical semantic compositional framework for medical natural language understanding.用于医学自然语言理解的分层语义组合框架的设计考虑因素。

PLoS One. 2023 Mar 16;18(3):e0282882. doi: 10.1371/journal.pone.0282882. eCollection 2023.

Representation of Pain Concepts and Terms in Existing Ontologies and Taxonomies.现有本体和分类法中疼痛概念与术语的表示

Pain Med. 2023 Jun 1;24(6):727-729. doi: 10.1093/pm/pnac178.

Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed.过去20年医学领域自然语言处理研究进展的系统评价：基于PubMed的文献计量学研究

J Med Internet Res. 2020 Jan 23;22(1):e16816. doi: 10.2196/16816.

Linked open data-based framework for automatic biomedical ontology generation.基于链接开放数据的自动生物医学本体生成框架。

BMC Bioinformatics. 2018 Sep 10;19(1):319. doi: 10.1186/s12859-018-2339-3.

Predicate Oriented Pattern Analysis for Biomedical Knowledge Discovery.面向谓词的生物医学知识发现模式分析

Intell Inf Manag. 2016 May;8(3):66-85. doi: 10.4236/iim.2016.83006.

Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection.迈向肥胖-癌症知识库：生物医学实体识别与关系检测

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2016 Dec;2016:1081-1088. doi: 10.1109/BIBM.2016.7822672. Epub 2017 Jan 19.

Knowledge Discovery from Biomedical Ontologies in Cross Domains.跨领域生物医学本体中的知识发现

PLoS One. 2016 Aug 22;11(8):e0160005. doi: 10.1371/journal.pone.0160005. eCollection 2016.

本文引用的文献

Thematic series on biomedical ontologies in JBMS: challenges and new directions.《牙颌面外科杂志》生物医学本体主题系列：挑战与新方向

J Biomed Semantics. 2014 Mar 6;5:15. doi: 10.1186/2041-1480-5-15. eCollection 2014.

Synonym extraction and abbreviation expansion with ensembles of semantic spaces.使用语义空间集合进行同义词提取和缩写扩展。

J Biomed Semantics. 2014 Feb 5;5(1):6. doi: 10.1186/2041-1480-5-6.

Applying ontological realism to medically unexplained syndromes.将本体论实在论应用于医学上无法解释的综合征。

Stud Health Technol Inform. 2013;192:97-101.

Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method.使用机器学习和C值方法从出院小结中提取语义词典。

AMIA Annu Symp Proc. 2012;2012:409-16. Epub 2012 Nov 3.

Ontology-guided feature engineering for clinical text classification.基于本体论的临床文本分类特征工程。

J Biomed Inform. 2012 Oct;45(5):992-8. doi: 10.1016/j.jbi.2012.04.010. Epub 2012 May 9.

Computer-assisted update of a consumer health vocabulary through mining of social network data.通过挖掘社交网络数据对消费者健康词汇进行计算机辅助更新。

J Med Internet Res. 2011 May 17;13(2):e37. doi: 10.2196/jmir.1636.

Effectiveness of lexico-syntactic pattern matching for ontology enrichment with clinical documents.词汇句法模式匹配在利用临床文档丰富本体方面的有效性。

Methods Inf Med. 2011;50(5):397-407. doi: 10.3414/ME10-01-0020. Epub 2010 Nov 8.

Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents.Textractor：一种混合系统，用于从临床文本文档中提取药物和其处方的原因。

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):559-62. doi: 10.1136/jamia.2010.004028.

Community annotation experiment for ground truth generation for the i2b2 medication challenge.社区注释实验，为 i2b2 药物挑战赛生成真实数据。

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):519-23. doi: 10.1136/jamia.2010.004200.

Natural Language Processing methods and systems for biomedical ontology learning.自然语言处理方法和系统在生物医学本体学习中的应用。

J Biomed Inform. 2011 Feb;44(1):163-79. doi: 10.1016/j.jbi.2010.07.006. Epub 2010 Jul 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于半自动本体管理（SEAM）系统的自动化概念与关系提取

Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献