Kong Guilan, Wu Jingyi, Chu Hong, Yang Chao, Lin Yu, Lin Ke, Shi Ying, Wang Haibo, Zhang Luxia
National Institute of Health Data Science, Peking University, Beijing, China.
Advanced Institute of Information Technology, Peking University, Hangzhou, China.
JMIR Med Inform. 2021 May 19;9(5):e17886. doi: 10.2196/17886.
The increasing number of patients treated with peritoneal dialysis (PD) and their consistently high rate of hospital admissions have placed a large burden on the health care system. Early clinical interventions and optimal management of patients at a high risk of prolonged length of stay (pLOS) may help improve the medical efficiency and prognosis of PD-treated patients. If timely clinical interventions are not provided, patients at a high risk of pLOS may face a poor prognosis and high medical expenses, which will also be a burden on hospitals. Therefore, physicians need an effective pLOS prediction model for PD-treated patients.
This study aimed to develop an optimal data-driven model for predicting the pLOS risk of PD-treated patients using basic admission data.
Patient data collected using the Hospital Quality Monitoring System (HQMS) in China were used to develop pLOS prediction models. A stacking model was constructed with support vector machine, random forest (RF), and K-nearest neighbor algorithms as its base models and traditional logistic regression (LR) as its meta-model. The meta-model used the outputs of all 3 base models as input and generated the output of the stacking model. Another LR-based pLOS prediction model was built as the benchmark model. The prediction performance of the stacking model was compared with that of its base models and the benchmark model. Five-fold cross-validation was employed to develop and validate the models. Performance measures included the Brier score, area under the receiver operating characteristic curve (AUROC), estimated calibration index (ECI), accuracy, sensitivity, specificity, and geometric mean (Gm). In addition, a calibration plot was employed to visually demonstrate the calibration power of each model.
The final cohort extracted from the HQMS database consisted of 23,992 eligible PD-treated patients, among whom 30.3% had a pLOS (ie, longer than the average LOS, which was 16 days in our study). Among the models, the stacking model achieved the best calibration (ECI 8.691), balanced accuracy (Gm 0.690), accuracy (0.695), and specificity (0.701). Meanwhile, the stacking and RF models had the best overall performance (Brier score 0.174 for both) and discrimination (AUROC 0.757 for the stacking model and 0.756 for the RF model). Compared with the benchmark LR model, the stacking model was superior in all performance measures except sensitivity, but there was no significant difference in sensitivity between the 2 models. The 2-sided t tests revealed significant performance differences between the stacking and LR models in overall performance, discrimination, calibration, balanced accuracy, and accuracy.
This study is the first to develop data-driven pLOS prediction models for PD-treated patients using basic admission data from a national database. The results indicate the feasibility of utilizing a stacking-based pLOS prediction model for PD-treated patients. The pLOS prediction tools developed in this study have the potential to assist clinicians in identifying patients at a high risk of pLOS and to allocate resources optimally for PD-treated patients.
接受腹膜透析(PD)治疗的患者数量不断增加,且其住院率持续居高不下,给医疗保健系统带来了沉重负担。对存在长期住院(pLOS)高风险的患者进行早期临床干预和优化管理,可能有助于提高PD治疗患者的医疗效率和预后。如果不及时进行临床干预,pLOS高风险患者可能面临预后不良和高额医疗费用,这也将给医院带来负担。因此,医生需要一种针对PD治疗患者的有效pLOS预测模型。
本研究旨在利用基本入院数据开发一种优化的数据驱动模型,以预测PD治疗患者的pLOS风险。
使用中国医院质量监测系统(HQMS)收集的患者数据来开发pLOS预测模型。构建了一个堆叠模型,以支持向量机、随机森林(RF)和K近邻算法作为其基础模型,传统逻辑回归(LR)作为其元模型。元模型将所有3个基础模型的输出作为输入,并生成堆叠模型的输出。构建了另一个基于LR的pLOS预测模型作为基准模型。将堆叠模型的预测性能与其基础模型和基准模型进行比较。采用五折交叉验证来开发和验证模型。性能指标包括Brier评分、受试者工作特征曲线下面积(AUROC)、估计校准指数(ECI)、准确率、敏感性、特异性和几何均值(Gm)。此外,使用校准图直观地展示每个模型的校准能力。
从HQMS数据库中提取的最终队列包括23992例符合条件的PD治疗患者,其中30.3%的患者存在pLOS(即住院时间长于平均住院时间,在本研究中平均住院时间为16天)。在这些模型中,堆叠模型实现了最佳校准(ECI为8.691)、平衡准确率(Gm为0.690)、准确率(0.695)和特异性(0.