Institute for Informatics (I2), Washington University School of Medicine, St. Louis, MO, United States of America.
Division of Gastroenterology, Northwestern Memorial Hospital, Chicago, IL, United States of America.
PLoS One. 2021 Aug 31;16(8):e0256428. doi: 10.1371/journal.pone.0256428. eCollection 2021.
Liver cirrhosis is a leading cause of death and effects millions of people in the United States. Early mortality prediction among patients with cirrhosis might give healthcare providers more opportunity to effectively treat the condition. We hypothesized that laboratory test results and other related diagnoses would be associated with mortality in this population. Our another assumption was that a deep learning model could outperform the current Model for End Stage Liver disease (MELD) score in predicting mortality.
We utilized electronic health record data from 34,575 patients with a diagnosis of cirrhosis from a large medical center to study associations with mortality. Three time-windows of mortality (365 days, 180 days and 90 days) and two cases with different number of variables (all 41 available variables and 4 variables in MELD-NA) were studied. Missing values were imputed using multiple imputation for continuous variables and mode for categorical variables. Deep learning and machine learning algorithms, i.e., deep neural networks (DNN), random forest (RF) and logistic regression (LR) were employed to study the associations between baseline features such as laboratory measurements and diagnoses for each time window by 5-fold cross validation method. Metrics such as area under the receiver operating curve (AUC), overall accuracy, sensitivity, and specificity were used to evaluate models.
Performance of models comprising all variables outperformed those with 4 MELD-NA variables for all prediction cases and the DNN model outperformed the LR and RF models. For example, the DNN model achieved an AUC of 0.88, 0.86, and 0.85 for 90, 180, and 365-day mortality respectively as compared to the MELD score, which resulted in corresponding AUCs of 0.81, 0.79, and 0.76 for the same instances. The DNN and LR models had a significantly better f1 score compared to MELD at all time points examined.
Other variables such as alkaline phosphatase, alanine aminotransferase, and hemoglobin were also top informative features besides the 4 MELD-Na variables. Machine learning and deep learning models outperformed the current standard of risk prediction among patients with cirrhosis. Advanced informatics techniques showed promise for risk prediction in patients with cirrhosis.
肝硬化是美国主要的死亡原因之一,影响着数以百万计的人。早期预测肝硬化患者的死亡率可能会为医疗保健提供者提供更多机会来有效治疗该疾病。我们假设实验室检测结果和其他相关诊断与该人群的死亡率相关。我们的另一个假设是,深度学习模型在预测死亡率方面可以优于当前的终末期肝病模型(MELD)评分。
我们利用来自一家大型医疗中心的 34575 名肝硬化患者的电子健康记录数据来研究与死亡率相关的因素。我们研究了三个时间窗口的死亡率(365 天、180 天和 90 天)和两种具有不同变量数量的情况(所有 41 个可用变量和 MELD-NA 中的 4 个变量)。对于连续变量,使用多重插补法填补缺失值,对于分类变量,使用模式填补缺失值。使用深度学习和机器学习算法,即深度神经网络(DNN)、随机森林(RF)和逻辑回归(LR),通过 5 折交叉验证法研究每个时间窗口中基线特征(如实验室测量值和诊断)与死亡率之间的关联。使用接收者操作特征曲线下面积(AUC)、总准确率、敏感度和特异性等指标来评估模型。
对于所有预测病例,包含所有变量的模型的性能均优于包含 4 个 MELD-NA 变量的模型,并且 DNN 模型优于 LR 和 RF 模型。例如,DNN 模型在 90 天、180 天和 365 天死亡率的 AUC 分别为 0.88、0.86 和 0.85,而 MELD 评分的 AUC 分别为 0.81、0.79 和 0.76。在所有检查的时间点,DNN 和 LR 模型的 f1 评分均明显优于 MELD。
除了 4 个 MELD-NA 变量外,碱性磷酸酶、丙氨酸氨基转移酶和血红蛋白等其他变量也是重要的信息特征。机器学习和深度学习模型优于肝硬化患者的当前风险预测标准。高级信息学技术在预测肝硬化患者的风险方面显示出了希望。