使用在线机器学习和受控词汇表的异构临床报告有效信息提取框架

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies.

作者信息

Zheng Shuai, Lu James J, Ghasemzadeh Nima, Hayek Salim S, Quyyumi Arshed A, Wang Fusheng

机构信息

Department of Biomedical Informatics, Emory University, Atlanta, GA, United States.

Department of Mathematics and Computer Science, Emory University, Atlanta, GA, United States.

出版信息

JMIR Med Inform. 2017 May 9;5(2):e12. doi: 10.2196/medinform.7235.

DOI:10.2196/medinform.7235

PMID:28487265

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5442348/

Abstract

BACKGROUND

Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time.

OBJECTIVE

Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results.

METHODS

A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction.

RESULTS

Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports-each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%.

CONCLUSIONS

IDEAL-X adopts a unique online machine learning-based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.

摘要

背景

从叙述性医学报告中提取结构化数据面临着异构结构和词汇复杂性的挑战，通常需要大量的人工操作。传统的基于机器的方法缺乏实时获取用户反馈以改进提取算法的能力。

目的

我们的目标是提供一个通用的信息提取框架，该框架可以支持各种临床报告，并实现人与机器之间的动态交互，从而产生高度准确的结果。

方法

一个临床信息提取系统IDEAL-X基于在线机器学习构建。它一次处理一份文档，用户交互被记录为反馈，以实时更新学习模型。更新后的模型用于预测后续文档中的提取值。一旦预测准确率达到用户可接受的阈值，其余文档可以进行批量处理。可以使用可定制的控制词汇表来支持提取。

结果

基于报告风格使用了三个数据集进行实验：100份心脏导管插入术报告、100份冠状动脉造影报告和100份综合报告（每份综合报告结合了病史和体格检查报告、出院小结、门诊病历、门诊信件和住院出院用药报告）。通过三种方法进行数据提取：在线机器学习、控制词汇表以及两者的结合。该系统的F1分数大于95%。

结论

IDEAL-X采用了一种独特的基于在线机器学习的方法，并结合控制词汇表来支持临床报告的数据提取。该系统能够快速学习和改进，因此具有高度的适应性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/638b/5442348/80cc5075cad5/medinform_v5i2e12_fig1.jpg

相似文献

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies.使用在线机器学习和受控词汇表的异构临床报告有效信息提取框架

JMIR Med Inform. 2017 May 9;5(2):e12. doi: 10.2196/medinform.7235.

Support patient search on pathology reports with interactive online learning based data extraction.通过基于交互式在线学习的数据提取来支持对病理报告的患者搜索。

J Pathol Inform. 2015 Sep 28;6:51. doi: 10.4103/2153-3539.166012. eCollection 2015.

PDF text classification to leverage information extraction from publication reports.利用出版物报告中的信息提取进行PDF文本分类。

J Biomed Inform. 2016 Jun;61:141-8. doi: 10.1016/j.jbi.2016.03.026. Epub 2016 Apr 1.

Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。

Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

Automated Information Extraction on Treatment and Prognosis for Non-Small Cell Lung Cancer Radiotherapy Patients: Clinical Study.非小细胞肺癌放疗患者治疗与预后的自动化信息提取：临床研究

JMIR Med Inform. 2018 Feb 1;6(1):e8. doi: 10.2196/medinform.8662.

Structuring Legacy Pathology Reports by openEHR Archetypes to Enable Semantic Querying.通过openEHR原型构建传统病理报告以实现语义查询。

Methods Inf Med. 2017 May 18;56(3):230-237. doi: 10.3414/ME16-01-0073. Epub 2017 Feb 28.

Ad Hoc Information Extraction for Clinical Data Warehouses.临床数据仓库的临时信息提取

Methods Inf Med. 2018 May;57(1):e22-e29. doi: 10.3414/ME17-02-0010. Epub 2018 May 25.

Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning.利用自然语言处理和机器学习有效识别国家规定的应报告癌症病例

J Am Med Inform Assoc. 2016 Nov;23(6):1077-1084. doi: 10.1093/jamia/ocw006. Epub 2016 Mar 28.

Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction.用于信息提取降噪的手术病理报告的机器学习分类及语块识别

Artif Intell Med. 2016 Jun;70:77-83. doi: 10.1016/j.artmed.2016.06.001. Epub 2016 Jun 8.

Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives.开发和评估 RapTAT：一种用于从医学叙述中映射短语概念的机器学习系统。

J Biomed Inform. 2014 Apr;48:54-65. doi: 10.1016/j.jbi.2013.11.008. Epub 2013 Dec 4.

引用本文的文献

BadCLM: Backdoor Attack in Clinical Language Models for Electronic Health Records.BadCLM：电子健康记录临床语言模型中的后门攻击

AMIA Annu Symp Proc. 2025 May 22;2024:768-777. eCollection 2024.

Clinical concept annotation with contextual word embedding in active transfer learning environment.主动迁移学习环境下基于上下文词嵌入的临床概念标注

Digit Health. 2024 Dec 19;10:20552076241308987. doi: 10.1177/20552076241308987. eCollection 2024 Jan-Dec.

Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study.探索使用自然语言处理支持全国静脉血栓栓塞监测的适用性：模型评估研究

JMIR Bioinform Biotechnol. 2022 May 8;3(1):e36877. doi: 10.2196/36877.

A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data for Interpretable In-Hospital Mortality Prediction.多模态 Transformer：融合临床笔记与结构化电子健康记录数据以实现可解释的住院死亡率预测。

AMIA Annu Symp Proc. 2023 Apr 29;2022:719-728. eCollection 2022.

Natural language processing in low back pain and spine diseases: A systematic review.下背痛和脊柱疾病中的自然语言处理：一项系统综述。

Front Surg. 2022 Jul 14;9:957085. doi: 10.3389/fsurg.2022.957085. eCollection 2022.

Racial differences in venous thromboembolism: A surveillance program in Durham County, North Carolina.静脉血栓栓塞的种族差异：北卡罗来纳州达勒姆县的一项监测项目

Res Pract Thromb Haemost. 2022 Jul 21;6(5):e12769. doi: 10.1002/rth2.12769. eCollection 2022 Jul.

Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models.使用预训练语言模型从德语出院小结中自动提取12个心血管概念。

Digit Health. 2021 Nov 26;7:20552076211057662. doi: 10.1177/20552076211057662. eCollection 2021 Jan-Dec.

Clinical concept extraction: A methodology review.临床概念提取：方法学综述。

J Biomed Inform. 2020 Sep;109:103526. doi: 10.1016/j.jbi.2020.103526. Epub 2020 Aug 6.

Clinical Text Data in Machine Learning: Systematic Review.机器学习中的临床文本数据：系统综述

JMIR Med Inform. 2020 Mar 31;8(3):e17984. doi: 10.2196/17984.

Adapting State-of-the-Art Deep Language Models to Clinical Information Extraction Systems: Potentials, Challenges, and Solutions.使最先进的深度语言模型适用于临床信息提取系统：潜力、挑战与解决方案。

JMIR Med Inform. 2019 Apr 25;7(2):e11499. doi: 10.2196/11499.

本文引用的文献

Assisted annotation of medical free text using RapTAT.使用 RapTAT 辅助医学自由文本的注释。

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):833-41. doi: 10.1136/amiajnl-2013-002255. Epub 2014 Jan 15.

Applying active learning to high-throughput phenotyping algorithms for electronic health records data.将主动学习应用于电子健康记录数据的高通量表型算法。

J Am Med Inform Assoc. 2013 Dec;20(e2):e253-9. doi: 10.1136/amiajnl-2013-001945. Epub 2013 Jul 13.

Aggregate risk score based on markers of inflammation, cell stress, and coagulation is an independent predictor of adverse cardiovascular outcomes.基于炎症、细胞应激和凝血标志物的综合风险评分是不良心血管结局的独立预测因子。

J Am Coll Cardiol. 2013 Jul 23;62(4):329-37. doi: 10.1016/j.jacc.2013.03.072. Epub 2013 May 9.

Applying active learning to supervised word sense disambiguation in MEDLINE.将主动学习应用于 MEDLINE 中的监督词义消歧。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):1001-6. doi: 10.1136/amiajnl-2012-001244. Epub 2013 Jan 30.

A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.基于机器学习的方法从出院小结中提取临床实体及其断言的研究。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):601-6. doi: 10.1136/amiajnl-2011-000163. Epub 2011 Apr 20.

Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.梅奥临床文本分析和知识提取系统（cTAKES）：架构、组件评估和应用。

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13. doi: 10.1136/jamia.2009.001560.

caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research.caTIES：一个基于网格的系统，用于编码和检索外科病理学报告和组织标本，以支持转化研究。

J Am Med Inform Assoc. 2010 May-Jun;17(3):253-64. doi: 10.1136/jamia.2009.002295.

MedEx: a medication information extraction system for clinical narratives.MedEx：一个用于临床叙述的药物信息提取系统。

J Am Med Inform Assoc. 2010 Jan-Feb;17(1):19-24. doi: 10.1197/jamia.M3378.

A novel hybrid approach to automated negation detection in clinical radiology reports.一种用于临床放射学报告中自动否定检测的新型混合方法。

J Am Med Inform Assoc. 2007 May-Jun;14(3):304-11. doi: 10.1197/jamia.M2284. Epub 2007 Feb 28.

Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system.提取用于哮喘研究的主要诊断、合并症和吸烟状况：自然语言处理系统的评估

BMC Med Inform Decis Mak. 2006 Jul 26;6:30. doi: 10.1186/1472-6947-6-30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用在线机器学习和受控词汇表的异构临床报告有效信息提取框架

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献