Suppr超能文献

一种用于从医院医疗数据中提取特征并应用于风险预测的框架。

A framework for feature extraction from hospital medical data with applications in risk prediction.

作者信息

Tran Truyen, Luo Wei, Phung Dinh, Gupta Sunil, Rana Santu, Kennedy Richard Lee, Larkins Ann, Venkatesh Svetha

机构信息

Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong, VIC, 3220, Australia.

Department of Computing, Curtin University, Perth, WA, Australia.

出版信息

BMC Bioinformatics. 2014 Dec 30;15(1):425. doi: 10.1186/s12859-014-0425-8.

Abstract

BACKGROUND

Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities.

RESULTS

Hospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods. For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD-baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes-baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders-baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia-baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72).

CONCLUSIONS

The advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks.

摘要

背景

特征工程是预测建模中耗时的一个组成部分。我们提出了一个通用平台,基于预定义且可扩展的实体模式自动提取用于风险预测的特征。这种提取独立于疾病类型或风险预测任务。我们将自动提取的特征与根据埃利克斯豪泽共病生成的基线进行对比。

结果

医院病历被转换为事件序列,并应用过滤器从中提取能够捕捉时间尺度和数据类型多样性的特征集。在再入院预测任务中对这些特征进行评估,并与根据埃利克斯豪泽共病生成的基线特征集进行比较。预测模型采用带有弹性网正则化的逻辑回归。针对糖尿病、慢性阻塞性肺疾病(COPD)、精神障碍和肺炎这四种不同疾病,考虑了1、2、3、6、12个月的预测期,并在不重叠的数据收集期定义了推导队列和验证队列。对于非计划再入院情况,使用社会人口统计学信息和病历自动提取的特征集,在超过20种设置(4种疾病的5个预测期)下,优于从社会人口统计学信息和埃利克斯豪泽共病得出的基线。特别是在超过30天的预测中,曲线下面积(AUC)分别为:COPD - 基线:0.60(95%置信区间:0.57,0.63),自动提取:0.67(0.64,0.70);糖尿病 - 基线:0.60(0.58,0.63),自动提取:0.67(0.64,0.69);精神障碍 - 基线:0.57(0.54,0.60),自动提取:0.69(0.64,0.70);肺炎 - 基线:0.61(0.59,0.63),自动提取:0.70(0.67,0.72)。

结论

证明了以一种与疾病和任务无关的方式从复杂病历中自动提取标准特征的优势。自动提取的特征在多个时间范围内具有良好的预测能力。这样的特征集有潜力构成复杂自动分析任务的基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77b7/4310185/de0887150fbf/12859_2014_425_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验