用于为患者住院分配临床代码的结构化和非结构化数据源的数据整合。

Data integration of structured and unstructured sources for assigning clinical codes to patient stays.

作者信息

Scheurwegs Elyne, Luyckx Kim, Luyten Léon, Daelemans Walter, Van den Bulcke Tim

机构信息

ADReM (Advanced Database Research and Modelling), Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp, Antwerp, Belgium

Department of Medical Information, Antwerp University Hospital, Antwerp, Belgium.

出版信息

J Am Med Inform Assoc. 2016 Apr;23(e1):e11-9. doi: 10.1093/jamia/ocv115. Epub 2015 Aug 27.

DOI:10.1093/jamia/ocv115

PMID:26316458

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4954635/

Abstract

OBJECTIVE

Enormous amounts of healthcare data are becoming increasingly accessible through the large-scale adoption of electronic health records. In this work, structured and unstructured (textual) data are combined to assign clinical diagnostic and procedural codes (specifically ICD-9-CM) to patient stays. We investigate whether integrating these heterogeneous data types improves prediction strength compared to using the data types in isolation.

METHODS

Two separate data integration approaches were evaluated. Early data integration combines features of several sources within a single model, and late data integration learns a separate model per data source and combines these predictions with a meta-learner. This is evaluated on data sources and clinical codes from a broad set of medical specialties.

RESULTS

When compared with the best individual prediction source, late data integration leads to improvements in predictive power (eg, overall F-measure increased from 30.6% to 38.3% for International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnostic codes), while early data integration is less consistent. The predictive strength strongly differs between medical specialties, both for ICD-9-CM diagnostic and procedural codes.

DISCUSSION

Structured data provides complementary information to unstructured data (and vice versa) for predicting ICD-9-CM codes. This can be captured most effectively by the proposed late data integration approach.

CONCLUSIONS

We demonstrated that models using multiple electronic health record data sources systematically outperform models using data sources in isolation in the task of predicting ICD-9-CM codes over a broad range of medical specialties.

摘要

目的

通过大规模采用电子健康记录，大量医疗保健数据变得越来越容易获取。在这项工作中，结构化和非结构化（文本）数据被结合起来为患者住院期间分配临床诊断和程序代码（具体为ICD-9-CM）。我们研究了与单独使用这些数据类型相比，整合这些异构数据类型是否能提高预测强度。

方法

评估了两种不同的数据整合方法。早期数据整合在单个模型中结合多个来源的特征，而后期数据整合则为每个数据源学习一个单独的模型，并将这些预测结果与一个元学习器相结合。这在来自广泛医学专科的数据源和临床代码上进行了评估。

结果

与最佳的单个预测源相比，后期数据整合提高了预测能力（例如，对于国际疾病分类第九版临床修订本（ICD-9-CM）诊断代码，总体F值从30.6%提高到38.3%），而早期数据整合的一致性较差。对于ICD-9-CM诊断和程序代码，预测强度在不同医学专科之间有很大差异。

讨论

在预测ICD-9-CM代码时，结构化数据为非结构化数据提供了补充信息（反之亦然）。所提出的后期数据整合方法能够最有效地捕捉这些信息。

结论

我们证明，在广泛的医学专科中预测ICD-9-CM代码的任务中，使用多个电子健康记录数据源的模型系统地优于单独使用数据源的模型。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于为患者住院分配临床代码的结构化和非结构化数据源的数据整合。

Data integration of structured and unstructured sources for assigning clinical codes to patient stays.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

DISCUSSION

CONCLUSIONS

目的

方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

用于为患者住院分配临床代码的结构化和非结构化数据源的数据整合。

Data integration of structured and unstructured sources for assigning clinical codes to patient stays.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

DISCUSSION

CONCLUSIONS

目的

方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献