Mayo Clinic, Rochester, MN, USA.
Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14.
BACKGROUND: Standards-based clinical data normalization has become a key component of effective data integration and accurate phenotyping for secondary use of electronic healthcare records (EHR) data. HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging clinical data standard for exchanging electronic healthcare data and has been used in modeling and integrating both structured and unstructured EHR data for a variety of clinical research applications. The overall objective of this study is to develop and evaluate a FHIR-based EHR phenotyping framework for identification of patients with obesity and its multiple comorbidities from semi-structured discharge summaries leveraging a FHIR-based clinical data normalization pipeline (known as NLP2FHIR). METHODS: We implemented a multi-class and multi-label classification system based on the i2b2 Obesity Challenge task to evaluate the FHIR-based EHR phenotyping framework. Two core parts of the framework are: (a) the conversion of discharge summaries into corresponding FHIR resources - Composition, Condition, MedicationStatement, Procedure and FamilyMemberHistory using the NLP2FHIR pipeline, and (b) the implementation of four machine learning algorithms (logistic regression, support vector machine, decision tree, and random forest) to train classifiers to predict disease state of obesity and 15 comorbidities using features extracted from standard FHIR resources and terminology expansions. We used the macro- and micro-averaged precision (P), recall (R), and F1 score (F1) measures to evaluate the classifier performance. We validated the framework using a second obesity dataset extracted from the MIMIC-III database. RESULTS: Using the NLP2FHIR pipeline, 1237 clinical discharge summaries from the 2008 i2b2 obesity challenge dataset were represented as the instances of the FHIR Composition resource consisting of 5677 records with 16 unique section types. After the NLP processing and FHIR modeling, a set of 244,438 FHIR clinical resource instances were generated. As the results of the four machine learning classifiers, the random forest algorithm performed the best with F1-micro(0.9466)/F1-macro(0.7887) and F1-micro(0.9536)/F1-macro(0.6524) for intuitive classification (reflecting medical professionals' judgments) and textual classification (reflecting the judgments based on explicitly reported information of diseases), respectively. The MIMIC-III obesity dataset was successfully integrated for prediction with minimal configuration of the NLP2FHIR pipeline and machine learning models. CONCLUSIONS: The study demonstrated that the FHIR-based EHR phenotyping approach could effectively identify the state of obesity and multiple comorbidities using semi-structured discharge summaries. Our FHIR-based phenotyping approach is a first concrete step towards improving the data aspect of phenotyping portability across EHR systems and enhancing interpretability of the machine learning-based phenotyping algorithms.
背景:基于标准的临床数据规范化已成为有效整合电子病历(EHR)数据和准确表型分析的关键组成部分。HL7 Fast Healthcare Interoperability Resources (FHIR) 是一种新兴的临床数据标准,用于交换电子医疗数据,并已用于对各种临床研究应用中的结构化和非结构化 EHR 数据进行建模和整合。本研究的总体目标是开发和评估一种基于 FHIR 的 EHR 表型分析框架,以利用基于 FHIR 的临床数据规范化管道(称为 NLP2FHIR)从半结构化出院总结中识别肥胖症及其多种合并症患者。
方法:我们实施了一个基于 i2b2 肥胖挑战任务的多类别和多标签分类系统,以评估基于 FHIR 的 EHR 表型分析框架。该框架的两个核心部分是:(a) 使用 NLP2FHIR 管道将出院总结转换为相应的 FHIR 资源 - Composition、Condition、MedicationStatement、Procedure 和 FamilyMemberHistory,以及 (b) 实施四种机器学习算法(逻辑回归、支持向量机、决策树和随机森林)来训练分类器,使用从标准 FHIR 资源和术语扩展中提取的特征来预测肥胖症和 15 种合并症的疾病状态。我们使用宏平均精度 (P)、召回率 (R) 和 F1 分数 (F1) 来评估分类器性能。我们使用从 MIMIC-III 数据库中提取的第二个肥胖数据集验证了该框架。
结果:使用 NLP2FHIR 管道,2008 年 i2b2 肥胖挑战数据集中的 1237 份临床出院总结表示为 FHIR Composition 资源的实例,该资源由 16 种唯一节类型的 5677 条记录组成。经过 NLP 处理和 FHIR 建模后,生成了一组 244438 个 FHIR 临床资源实例。作为四种机器学习分类器的结果,随机森林算法表现最佳,其 F1-micro(0.9466)/F1-macro(0.7887)和 F1-micro(0.9536)/F1-macro(0.6524)分别用于直观分类(反映医疗专业人员的判断)和文本分类(反映基于疾病明确报告信息的判断)。MIMIC-III 肥胖数据集成功集成用于预测,仅需对 NLP2FHIR 管道和机器学习模型进行最小配置。
结论:研究表明,基于 FHIR 的 EHR 表型分析方法可以使用半结构化出院总结有效识别肥胖症和多种合并症的状态。我们的基于 FHIR 的表型分析方法是朝着改善 EHR 系统之间表型可移植性的数据方面迈出的第一步,并增强了基于机器学习的表型分析算法的可解释性。
AMIA Jt Summits Transl Sci Proc. 2021
AMIA Jt Summits Transl Sci Proc. 2021
AMIA Jt Summits Transl Sci Proc. 2018-5-18
Comput Methods Programs Biomed. 2024-10
BMC Med Inform Decis Mak. 2017-8-14
J Biomed Inform. 2022-10
Sensors (Basel). 2024-2-8
J Biomed Semantics. 2023-8-31
AMIA Annu Symp Proc. 2020-3-4
NPJ Digit Med. 2018-5-8
BMC Med Inform Decis Mak. 2019-4-4
AMIA Jt Summits Transl Sci Proc. 2018-5-18
Am J Psychiatry. 2017-2-1
Sci Data. 2016-5-24
Stud Health Technol Inform. 2015