Lister Hill National Center for Biomedical Communications, US National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.
J Am Med Inform Assoc. 2020 Apr 1;27(4):567-576. doi: 10.1093/jamia/ocaa004.
Reliable longitudinal risk prediction for hospitalized patients is needed to provide quality care. Our goal is to develop a generalizable model capable of leveraging clinical notes to predict healthcare-associated diseases 24-96 hours in advance.
We developed a reCurrent Additive Network for Temporal RIsk Prediction (CANTRIP) to predict the risk of hospital acquired (occurring ≥ 48 hours after admission) acute kidney injury, pressure injury, or anemia ≥ 24 hours before it is implicated by the patient's chart, labs, or notes. We rely on the MIMIC III critical care database and extract distinct positive and negative cohorts for each disease. We retrospectively determine the date-of-event using structured and unstructured criteria and use it as a form of indirect supervision to train and evaluate CANTRIP to predict disease risk using clinical notes.
Our experiments indicate that CANTRIP, operating on text alone, obtains 74%-87% area under the curve and 77%-85% Specificity. Baseline shallow models showed lower performance on all metrics, while bidirectional long short-term memory obtained the highest Sensitivity at the cost of significantly lower Specificity and Precision.
Proper model architecture allows clinical text to be successfully harnessed to predict nosocomial disease, outperforming shallow models and obtaining similar performance to disease-specific models reported in the literature.
Clinical text on its own can provide a competitive alternative to traditional structured features (eg, lab values, vital signs). CANTRIP is able to generalize across nosocomial diseases without disease-specific feature extraction and is available at https://github.com/h4ste/cantrip.
需要可靠的纵向风险预测来为住院患者提供高质量的护理。我们的目标是开发一种可推广的模型,能够利用临床记录提前 24-96 小时预测与医疗保健相关的疾病。
我们开发了一个用于时间风险预测的循环加法网络 (CANTRIP),以预测医院获得性(在入院后≥48 小时发生)急性肾损伤、压力性损伤或贫血的风险,即在患者的图表、实验室检查或记录中暗示之前≥24 小时。我们依赖 MIMIC III 重症监护数据库,并为每种疾病提取不同的阳性和阴性队列。我们使用结构化和非结构化标准来回顾性地确定事件日期,并将其作为间接监督的一种形式,使用临床记录来训练和评估 CANTRIP 以预测疾病风险。
我们的实验表明,CANTRIP 仅使用文本,即可获得 74%-87%的曲线下面积和 77%-85%的特异性。基线浅层模型在所有指标上的表现都较低,而双向长短期记忆模型在敏感性方面表现最佳,但特异性和精度显著降低。
适当的模型架构允许成功利用临床文本来预测医院获得性疾病,优于浅层模型,并获得与文献中报告的疾病特异性模型相似的性能。
仅临床文本就可以提供一种有竞争力的替代传统结构化特征(例如,实验室值、生命体征)的方法。CANTRIP 能够在不进行疾病特异性特征提取的情况下跨医院获得性疾病进行泛化,并且可在 https://github.com/h4ste/cantrip 上获得。