Ji Yu, Wang Kaipeng, Yuan Yuan, Wang Yueguo, Liu Qingyuan, Wang Yulan, Sun Jian, Wang Wenwen, Wang Huanli, Zhou Shusheng, Jin Kui, Zhang Mengping, Lai Yinglei
School of Mathematical Sciences, University of Science and Technology of China, Hefei, Anhui, 230026, China.
School of Mathematics and Statistics, Nanjing University of Science and Technology, Nanjing, Jiangsu, 210094, China.
Comput Biol Chem. 2024 Dec;113:108203. doi: 10.1016/j.compbiolchem.2024.108203. Epub 2024 Sep 2.
The prediction of sepsis, especially early diagnosis, has received a significant attention in biomedical research. In order to improve current medical scoring system and overcome the limitations of class imbalance and sample size of local EHR (electronic health records), we propose a novel knowledge-transfer-based approach, which combines a medical scoring system and an ordinal logistic regression model.
Medical scoring systems (i.e. NEWS, SIRS and QSOFA) are generally robust and useful for sepsis diagnosis. With local EHR, machine-learning-based methods have been widely used for building prediction models/methods, but they are often impacted by class imbalance and sample size. Knowledge distillation and knowledge transfer have recently been proposed as a combination approach for improving the prediction performance and model generalization. In this study, we developed a novel knowledge-transfer-based method for combining a medical scoring system (after a proposed score transformation) and an ordinal logistic regression model. We mathematically confirmed that it was equivalent to a specific form of the weighted regression. Furthermore, we theoretically explored its effectiveness in the scenario of class imbalance.
For the local dataset and the MIMIC-IV dataset, the VUS (the volume under the multi-dimensional ROC surface, a generalization measure of AUC-ROC for ordinal categories) of the knowledge-transfer-based model (ORNEWS) based on the NEWS scoring system were 0.384 and 0.339, respectively, while the VUS of the traditional ordinal regression model (OR) were 0.352 and 0.322, respectively. Consistent analysis results were also observed for the knowledge-transfer-based models based on the SIRS/QSOFA scoring systems in the ordinal scenarios. Additionally, the predicted probabilities and the binary classification ROC curves of the knowledge-transfer-based models indicated that this approach enhanced the predicted probabilities for the minority classes while reducing the predicted probabilities for the majority classes, which improved AUCs/VUSs on imbalanced data.
Knowledge transfer, which combines a medical scoring system and a machine-learning-based model, improves the prediction performance for early diagnosis of sepsis, especially in the scenarios of class imbalance and limited sample size.
脓毒症的预测,尤其是早期诊断,在生物医学研究中受到了广泛关注。为了改进当前的医学评分系统并克服局部电子健康记录(EHR)中类别不平衡和样本量的局限性,我们提出了一种基于知识转移的新方法,该方法结合了医学评分系统和有序逻辑回归模型。
医学评分系统(即NEWS、SIRS和QSOFA)通常对脓毒症诊断具有稳健性且很有用。借助局部EHR,基于机器学习的方法已被广泛用于构建预测模型/方法,但它们常常受到类别不平衡和样本量的影响。知识蒸馏和知识转移最近被提出作为一种组合方法,用于提高预测性能和模型泛化能力。在本研究中,我们开发了一种基于知识转移的新方法,用于将医学评分系统(经过提议的分数转换后)与有序逻辑回归模型相结合。我们从数学上证实它等同于加权回归的一种特定形式。此外,我们从理论上探讨了其在类别不平衡情况下的有效性。
对于局部数据集和MIMIC-IV数据集,基于NEWS评分系统的基于知识转移的模型(ORNEWS)的多维ROC曲面下的体积(VUS,一种用于有序类别对AUC-ROC的泛化度量)分别为0.384和0.339,而传统有序回归模型(OR)的VUS分别为0.352和0.322。在有序情况下,基于SIRS/QSOFA评分系统的基于知识转移的模型也观察到了一致的分析结果。此外,基于知识转移的模型的预测概率和二元分类ROC曲线表明,这种方法提高了少数类别的预测概率,同时降低了多数类别的预测概率,从而在不平衡数据上提高了AUCs/VUSs。
将医学评分系统和基于机器学习的模型相结合的知识转移,提高了脓毒症早期诊断的预测性能,尤其是在类别不平衡和样本量有限的情况下。