Carmichael Harris, Coquet Jean, Sun Ran, Sang Shengtian, Groat Danielle, Asch Steven M, Bledsoe Joseph, Peltan Ithan D, Jacobs Jason R, Hernandez-Boussard Tina
Department of Medicine, Stanford University, Stanford, CA, United States; Healthcare Delivery Institute, Intermountain Healthcare, Murray, UT, United States.
Department of Medicine, Stanford University, Stanford, CA, United States.
J Biomed Inform. 2021 Jul;119:103802. doi: 10.1016/j.jbi.2021.103802. Epub 2021 May 27.
Unlike well-established diseases that base clinical care on randomized trials, past experiences, and training, prognosis in COVID19 relies on a weaker foundation. Knowledge from other respiratory failure diseases may inform clinical decisions in this novel disease. The objective was to predict 48-hour invasive mechanical ventilation (IMV) within 48 h in patients hospitalized with COVID-19 using COVID-like diseases (CLD).
This retrospective multicenter study trained machine learning (ML) models on patients hospitalized with CLD to predict IMV within 48 h in COVID-19 patients. CLD patients were identified using diagnosis codes for bacterial pneumonia, viral pneumonia, influenza, unspecified pneumonia and acute respiratory distress syndrome (ARDS), 2008-2019. A total of 16 cohorts were constructed, including any combinations of the four diseases plus an exploratory ARDS cohort, to determine the most appropriate cohort to use. Candidate predictors included demographic and clinical parameters that were previously associated with poor COVID-19 outcomes. Model development included the implementation of logistic regression and three ensemble tree-based algorithms: decision tree, AdaBoost, and XGBoost. Models were validated in hospitalized COVID-19 patients at two healthcare systems, March 2020-July 2020. ML models were trained on CLD patients at Stanford Hospital Alliance (SHA). Models were validated on hospitalized COVID-19 patients at both SHA and Intermountain Healthcare.
CLD training data were obtained from SHA (n = 14,030), and validation data included 444 adult COVID-19 hospitalized patients from SHA (n = 185) and Intermountain (n = 259). XGBoost was the top-performing ML model, and among the 16 CLD training cohorts, the best model achieved an area under curve (AUC) of 0.883 in the validation set. In COVID-19 patients, the prediction models exhibited moderate discrimination performance, with the best models achieving an AUC of 0.77 at SHA and 0.65 at Intermountain. The model trained on all pneumonia and influenza cohorts had the best overall performance (SHA: positive predictive value (PPV) 0.29, negative predictive value (NPV) 0.97, positive likelihood ratio (PLR) 10.7; Intermountain: PPV, 0.23, NPV 0.97, PLR 10.3). We identified important factors associated with IMV that are not traditionally considered for respiratory diseases.
The performance of prediction models derived from CLD for 48-hour IMV in patients hospitalized with COVID-19 demonstrate high specificity and can be used as a triage tool at point of care. Novel predictors of IMV identified in COVID-19 are often overlooked in clinical practice. Lessons learned from our approach may assist other research institutes seeking to build artificial intelligence technologies for novel or rare diseases with limited data for training and validation.
与基于随机试验、既往经验和培训进行临床护理的成熟疾病不同,COVID-19的预后依据较为薄弱。其他呼吸衰竭疾病的知识可能为这种新型疾病的临床决策提供参考。目的是利用类COVID疾病(CLD)预测COVID-19住院患者48小时内的有创机械通气(IMV)情况。
这项回顾性多中心研究对CLD住院患者的机器学习(ML)模型进行训练,以预测COVID-19患者48小时内的IMV情况。使用2008 - 2019年细菌性肺炎、病毒性肺炎、流感、未明确的肺炎和急性呼吸窘迫综合征(ARDS)的诊断编码来识别CLD患者。共构建了16个队列,包括这四种疾病的任意组合以及一个探索性ARDS队列,以确定最合适的队列。候选预测因素包括先前与COVID-19不良结局相关的人口统计学和临床参数。模型开发包括逻辑回归和三种基于集成树的算法的实施:决策树、AdaBoost和XGBoost。模型在2020年3月至2020年7月两个医疗系统的COVID-19住院患者中进行验证。ML模型在斯坦福医院联盟(SHA)的CLD患者中进行训练。模型在SHA和山间医疗保健机构的COVID-19住院患者中进行验证。
CLD训练数据来自SHA(n = 14,030),验证数据包括来自SHA(n = 185)和山间医疗保健机构(n = 259)的444例成年COVID-19住院患者。XGBoost是表现最佳的ML模型,在16个CLD训练队列中,最佳模型在验证集中的曲线下面积(AUC)为0.883。在COVID-19患者中,预测模型表现出中等的区分性能,最佳模型在SHA的AUC为0.77,在山间医疗保健机构为0.65。在所有肺炎和流感队列上训练的模型总体性能最佳(SHA:阳性预测值(PPV)0.29,阴性预测值(NPV)0.97,阳性似然比(PLR)10.7;山间医疗保健机构:PPV 0.23,NPV 0.97,PLR 10.3)。我们确定了与IMV相关的重要因素,这些因素在传统上并未被视为呼吸系统疾病的因素。
源自CLD的预测模型对COVID-19住院患者48小时IMV情况的预测表现出高特异性,可作为床边分诊工具。在COVID-19中确定的IMV新预测因素在临床实践中常常被忽视。我们的方法所吸取的经验教训可能有助于其他寻求为新型或罕见疾病构建人工智能技术的研究机构,这些疾病的训练和验证数据有限。