Centre of Excellence for Health, Immunity and Infections (CHIP), Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark.
Faculty of Medicine, University of Colombo, Colombo, Sri Lanka.
PLoS Negl Trop Dis. 2023 Mar 13;17(3):e0010758. doi: 10.1371/journal.pntd.0010758. eCollection 2023 Mar.
At least a third of dengue patients develop plasma leakage with increased risk of life-threatening complications. Predicting plasma leakage using laboratory parameters obtained in early infection as means of triaging patients for hospital admission is important for resource-limited settings.
A Sri Lankan cohort including 4,768 instances of clinical data from N = 877 patients (60.3% patients with confirmed dengue infection) recorded in the first 96 hours of fever was considered. After excluding incomplete instances, the dataset was randomly split into a development and a test set with 374 (70%) and 172 (30%) patients, respectively. From the development set, five most informative features were selected using the minimum description length (MDL) algorithm. Random forest and light gradient boosting machine (LightGBM) were used to develop a classification model using the development set based on nested cross validation. An ensemble of the learners via average stacking was used as the final model to predict plasma leakage.
Lymphocyte count, haemoglobin, haematocrit, age, and aspartate aminotransferase were the most informative features to predict plasma leakage. The final model achieved the area under the receiver operating characteristics curve, AUC = 0.80 with positive predictive value, PPV = 76.9%, negative predictive value, NPV = 72.5%, specificity = 87.9%, and sensitivity = 54.8% on the test set.
The early predictors of plasma leakage identified in this study are similar to those identified in several prior studies that used non-machine learning based methods. However, our observations strengthen the evidence base for these predictors by showing their relevance even when individual data points, missing data and non-linear associations were considered. Testing the model on different populations using these low-cost observations would identify further strengths and limitations of the presented model.
至少有三分之一的登革热患者会出现血浆渗漏,增加发生危及生命并发症的风险。在资源有限的情况下,使用感染早期获得的实验室参数预测血浆渗漏,并以此作为对患者进行住院治疗分诊的手段,这一点非常重要。
我们考虑了一个包括 877 名患者(60.3%的患者确诊为登革热感染)的斯里兰卡队列,该队列记录了发热后前 96 小时内的 4768 例临床数据。排除不完整的病例后,数据集被随机分为开发集和测试集,分别有 374 例(70%)和 172 例(30%)患者。从开发集中,使用最小描述长度(MDL)算法选择了 5 个最具信息量的特征。使用嵌套交叉验证,基于开发集使用随机森林和轻梯度提升机(LightGBM)开发分类模型。通过平均堆叠对学习者进行集成作为最终模型,用于预测血浆渗漏。
淋巴细胞计数、血红蛋白、血细胞比容、年龄和天冬氨酸氨基转移酶是预测血浆渗漏最具信息量的特征。最终模型在测试集上的 AUC(ROC 曲线下面积)为 0.80,阳性预测值(PPV)为 76.9%,阴性预测值(NPV)为 72.5%,特异性为 87.9%,灵敏度为 54.8%。
本研究中确定的血浆渗漏早期预测因素与使用非机器学习方法的几项先前研究中确定的因素相似。然而,我们的观察结果通过显示即使考虑了单个数据点、缺失数据和非线性关联,这些预测因素仍然具有相关性,从而为这些预测因素提供了更有力的证据。在不同人群中使用这些低成本观察结果来测试该模型,可以确定所提出模型的进一步优势和局限性。