Suppr超能文献

用于半自动本体管理(SEAM)系统的自动化概念与关系提取

Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system.

作者信息

Doing-Harris Kristina, Livnat Yarden, Meystre Stephane

机构信息

University of Utah, Department of Biomedical Informatics, 421 Wakara Way, Suite 140, Salt Lake City, UT 84112 USA.

Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT USA.

出版信息

J Biomed Semantics. 2015 Apr 2;6:15. doi: 10.1186/s13326-015-0011-7. eCollection 2015.

Abstract

BACKGROUND

We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and relationship extraction techniques in a low overhead modifiable system. Our SEmi-Automated ontology Maintenance (SEAM) system features a natural language processing pipeline for information extraction. Synonym and hierarchical groups are identified using corpus-based semantics and lexico-syntactic patterns. The semantic vectors we use are term frequency by inverse document frequency and context vectors. Clinical documents contain the terms we want in an ontology. They also contain idiosyncratic usage and are unlikely to contain the linguistic constructs associated with synonym and hierarchy identification. By including both clinical and biomedical texts, SEAM can recommend terms from those appearing in both document types. The set of recommended terms is then used to filter the synonyms and hierarchical relationships extracted from the biomedical corpus. We demonstrate the generality of the system across three use cases: ontologies for acute changes in mental status, Medically Unexplained Syndromes, and echocardiogram summary statements.

RESULTS

Across the three uses cases, we held the number of recommended terms relatively constant by changing SEAM's parameters. Experts seem to find more than 300 recommended terms to be overwhelming. The approval rate of recommended terms increased as the number and specificity of clinical documents in the corpus increased. It was 60% when there were 199 clinical documents that were not specific to the ontology domain and 90% when there were 2879 documents very specific to the target domain. We found that fewer than 100 recommended synonym groups were also preferred. Approval rates for synonym recommendations remained low varying from 43% to 25% as the number of journal articles increased from 19 to 47. Overall the number of recommended hierarchical relationships was very low although approval was good. It varied between 67% and 31%.

CONCLUSION

SEAM produced a concise list of recommended clinical terms, synonyms and hierarchical relationships regardless of medical domain.

摘要

背景

我们开发特定医学专业的本体,其中包含既定科学和通用术语用法。我们利用信息和关系提取方面的当前实践来简化本体开发过程。我们的系统在一个低开销可修改系统中结合了不同文本类型以及信息和关系提取技术。我们的半自动本体维护(SEAM)系统具有用于信息提取的自然语言处理管道。使用基于语料库的语义和词汇句法模式来识别同义词和层次组。我们使用的语义向量是词频逆文档频率和上下文向量。临床文档包含我们希望在本体中出现的术语。它们还包含特殊用法,并且不太可能包含与同义词和层次识别相关的语言结构。通过纳入临床和生物医学文本,SEAM可以从两种文档类型中都出现的术语中进行推荐。然后,推荐术语集用于筛选从生物医学语料库中提取的同义词和层次关系。我们通过三个用例展示了该系统的通用性:精神状态急性变化的本体、医学无法解释的综合征以及超声心动图总结陈述。

结果

在这三个用例中,我们通过更改SEAM的参数使推荐术语的数量相对保持恒定。专家似乎认为超过300个推荐术语过多。随着语料库中临床文档的数量和特异性增加,推荐术语的批准率也随之提高。当有199份并非特定于本体领域的临床文档时,批准率为60%;当有2879份非常特定于目标领域的文档时,批准率为90%。我们还发现,少于100个推荐同义词组也更受青睐。随着期刊文章数量从19篇增加到47篇,同义词推荐的批准率仍然很低,从43%到25%不等。总体而言,尽管批准情况良好,但推荐的层次关系数量非常少。其变化范围在67%至31%之间。

结论

无论医学领域如何,SEAM都能生成一份简洁的推荐临床术语、同义词和层次关系列表。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4c9/4396714/f1db7c17cece/13326_2015_11_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验