Suppr超能文献

利用机器学习从放射学报告中提取完整的结构化信息。

Toward Complete Structured Information Extraction from Radiology Reports Using Machine Learning.

机构信息

Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, PA, 19104, USA.

Boston University School of Medicine, Boston, MA, 02119, USA.

出版信息

J Digit Imaging. 2019 Aug;32(4):554-564. doi: 10.1007/s10278-019-00234-y.

Abstract

Unstructured and semi-structured radiology reports represent an underutilized trove of information for machine learning (ML)-based clinical informatics applications, including abnormality tracking systems, research cohort identification, point-of-care summarization, semi-automated report writing, and as a source of weak data labels for training image processing systems. Clinical ML systems must be interpretable to ensure user trust. To create interpretable models applicable to all of these tasks, we can build general-purpose systems which extract all relevant human-level assertions or "facts" documented in reports; identifying these facts is an information extraction (IE) task. Previous IE work in radiology has focused on a limited set of information, and extracts isolated entities (i.e., single words such as "lesion" or "cyst") rather than complete facts, which require the linking of multiple entities and modifiers. Here, we develop a prototype system to extract all useful information in abdominopelvic radiology reports (findings, recommendations, clinical history, procedures, imaging indications and limitations, etc.), in the form of complete, contextualized facts. We construct an information schema to capture the bulk of information in reports, develop real-time ML models to extract this information, and demonstrate the feasibility and performance of the system.

摘要

非结构化和半结构化的放射学报告是机器学习(ML)临床信息学应用的一个未充分利用的信息宝库,包括异常跟踪系统、研究队列识别、即时总结、半自动报告撰写,以及作为训练图像处理系统的弱数据标签的来源。临床 ML 系统必须具有可解释性,以确保用户信任。为了创建适用于所有这些任务的可解释模型,我们可以构建通用系统,提取报告中记录的所有相关人类级断言或“事实”;识别这些事实是一项信息提取(IE)任务。放射学中的先前 IE 工作集中在有限的信息集上,并且提取孤立的实体(即单个单词,如“病变”或“囊肿”),而不是完整的事实,完整的事实需要链接多个实体和修饰语。在这里,我们开发了一个原型系统,以提取腹盆腔放射学报告(发现、建议、临床病史、程序、成像指征和限制等)中所有有用的信息,形式为完整的、上下文化的事实。我们构建了一个信息模式来捕获报告中的大部分信息,开发实时 ML 模型来提取这些信息,并演示系统的可行性和性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c44/6646440/90e5670ebdec/10278_2019_234_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验