Zheng Shuai, Lu James J, Ghasemzadeh Nima, Hayek Salim S, Quyyumi Arshed A, Wang Fusheng
Department of Biomedical Informatics, Emory University, Atlanta, GA, United States.
Department of Mathematics and Computer Science, Emory University, Atlanta, GA, United States.
JMIR Med Inform. 2017 May 9;5(2):e12. doi: 10.2196/medinform.7235.
Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time.
Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results.
A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction.
Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports-each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%.
IDEAL-X adopts a unique online machine learning-based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.
从叙述性医学报告中提取结构化数据面临着异构结构和词汇复杂性的挑战,通常需要大量的人工操作。传统的基于机器的方法缺乏实时获取用户反馈以改进提取算法的能力。
我们的目标是提供一个通用的信息提取框架,该框架可以支持各种临床报告,并实现人与机器之间的动态交互,从而产生高度准确的结果。
一个临床信息提取系统IDEAL-X基于在线机器学习构建。它一次处理一份文档,用户交互被记录为反馈,以实时更新学习模型。更新后的模型用于预测后续文档中的提取值。一旦预测准确率达到用户可接受的阈值,其余文档可以进行批量处理。可以使用可定制的控制词汇表来支持提取。
基于报告风格使用了三个数据集进行实验:100份心脏导管插入术报告、100份冠状动脉造影报告和100份综合报告(每份综合报告结合了病史和体格检查报告、出院小结、门诊病历、门诊信件和住院出院用药报告)。通过三种方法进行数据提取:在线机器学习、控制词汇表以及两者的结合。该系统的F1分数大于95%。
IDEAL-X采用了一种独特的基于在线机器学习的方法,并结合控制词汇表来支持临床报告的数据提取。该系统能够快速学习和改进,因此具有高度的适应性。