Suppr超能文献

使用在线机器学习和受控词汇表的异构临床报告有效信息提取框架

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies.

作者信息

Zheng Shuai, Lu James J, Ghasemzadeh Nima, Hayek Salim S, Quyyumi Arshed A, Wang Fusheng

机构信息

Department of Biomedical Informatics, Emory University, Atlanta, GA, United States.

Department of Mathematics and Computer Science, Emory University, Atlanta, GA, United States.

出版信息

JMIR Med Inform. 2017 May 9;5(2):e12. doi: 10.2196/medinform.7235.

Abstract

BACKGROUND

Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time.

OBJECTIVE

Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results.

METHODS

A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction.

RESULTS

Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports-each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%.

CONCLUSIONS

IDEAL-X adopts a unique online machine learning-based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.

摘要

背景

从叙述性医学报告中提取结构化数据面临着异构结构和词汇复杂性的挑战,通常需要大量的人工操作。传统的基于机器的方法缺乏实时获取用户反馈以改进提取算法的能力。

目的

我们的目标是提供一个通用的信息提取框架,该框架可以支持各种临床报告,并实现人与机器之间的动态交互,从而产生高度准确的结果。

方法

一个临床信息提取系统IDEAL-X基于在线机器学习构建。它一次处理一份文档,用户交互被记录为反馈,以实时更新学习模型。更新后的模型用于预测后续文档中的提取值。一旦预测准确率达到用户可接受的阈值,其余文档可以进行批量处理。可以使用可定制的控制词汇表来支持提取。

结果

基于报告风格使用了三个数据集进行实验:100份心脏导管插入术报告、100份冠状动脉造影报告和100份综合报告(每份综合报告结合了病史和体格检查报告、出院小结、门诊病历、门诊信件和住院出院用药报告)。通过三种方法进行数据提取:在线机器学习、控制词汇表以及两者的结合。该系统的F1分数大于95%。

结论

IDEAL-X采用了一种独特的基于在线机器学习的方法,并结合控制词汇表来支持临床报告的数据提取。该系统能够快速学习和改进,因此具有高度的适应性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/638b/5442348/80cc5075cad5/medinform_v5i2e12_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验