Suppr超能文献

利用数据驱动的子语言模式挖掘来诱导知识模型:在医学图像报告知识表示中的应用。

Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation.

机构信息

Department of Health Informatics and Administration, Center for Biomedical Data and Language Processing, University of Wisconsin-Milwaukee, 2025 E Newport Ave, NWQ-B Room 6469, Milwaukee, WI, 53211, USA.

Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, 205 3rd Ave SW, Rochester, MN, 55905, USA.

出版信息

BMC Med Inform Decis Mak. 2018 Jul 6;18(1):61. doi: 10.1186/s12911-018-0645-3.

Abstract

BACKGROUND

The use of knowledge models facilitates information retrieval, knowledge base development, and therefore supports new knowledge discovery that ultimately enables decision support applications. Most existing works have employed machine learning techniques to construct a knowledge base. However, they often suffer from low precision in extracting entity and relationships. In this paper, we described a data-driven sublanguage pattern mining method that can be used to create a knowledge model. We combined natural language processing (NLP) and semantic network analysis in our model generation pipeline.

METHODS

As a use case of our pipeline, we utilized data from an open source imaging case repository, Radiopaedia.org , to generate a knowledge model that represents the contents of medical imaging reports. We extracted entities and relationships using the Stanford part-of-speech parser and the "Subject:Relationship:Object" syntactic data schema. The identified noun phrases were tagged with the Unified Medical Language System (UMLS) semantic types. An evaluation was done on a dataset comprised of 83 image notes from four data sources.

RESULTS

A semantic type network was built based on the co-occurrence of 135 UMLS semantic types in 23,410 medical image reports. By regrouping the semantic types and generalizing the semantic network, we created a knowledge model that contains 14 semantic categories. Our knowledge model was able to cover 98% of the content in the evaluation corpus and revealed 97% of the relationships. Machine annotation achieved a precision of 87%, recall of 79%, and F-score of 82%.

CONCLUSION

The results indicated that our pipeline was able to produce a comprehensive content-based knowledge model that could represent context from various sources in the same domain.

摘要

背景

知识模型的使用有助于信息检索、知识库的开发,从而支持新的知识发现,最终支持决策支持应用。大多数现有工作都采用机器学习技术来构建知识库。然而,它们在提取实体和关系方面往往精度较低。在本文中,我们描述了一种数据驱动的子语言模式挖掘方法,可用于创建知识模型。我们在模型生成管道中结合了自然语言处理 (NLP) 和语义网络分析。

方法

作为我们管道的用例,我们利用来自开源成像案例库 Radiopaedia.org 的数据来生成一个知识模型,该模型表示医学成像报告的内容。我们使用斯坦福词性解析器和“主题:关系:对象”句法数据模式提取实体和关系。识别出的名词短语被标记为统一医学语言系统 (UMLS) 语义类型。我们在由来自四个数据源的 83 个图像注释组成的数据集上进行了评估。

结果

基于在 23,410 个医学图像报告中出现的 135 个 UMLS 语义类型的共现,构建了一个语义类型网络。通过重新组合语义类型并概括语义网络,我们创建了一个包含 14 个语义类别的知识模型。我们的知识模型能够涵盖评估语料库中的 98%的内容,并揭示 97%的关系。机器标注的精度为 87%,召回率为 79%,F1 得分为 82%。

结论

结果表明,我们的管道能够生成一个全面的基于内容的知识模型,能够表示来自同一领域的各种来源的上下文。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6cf/6035419/434a12848032/12911_2018_645_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验