Chen Hao, Chen Fanxuan, Wang Yijun, Cai Enna, Pan Wangzheng, Li Yichen, Mo Zefei, Lou Hao, Ren Chufan, Dai Chenyue, Shan Xingbo, Ye Hui, Xu Zhenwei, Dong Pu, Zhou Han, Xu Shuya, Zhu Tianye, Su Mingzhi, Miao Xingguo, Hu Xiaoqu, Hong Liang, Wang Yi, Su Feifei
Department of Infectious Diseases, Wenzhou Central Hospital, Wenzhou, China.
The First School of Medicine, School of Information and Engineering, Wenzhou Medical University, Wenzhou, China.
J Cell Mol Med. 2025 Mar;29(6):e70497. doi: 10.1111/jcmm.70497.
Opportunistic infections (OIs) are the leading cause of hospitalisation and mortality among Human Immunodeficiency Virus-infected (HIV-infected) patients. The diverse pathogen types and intricate clinical manifestations associated present a formidable challenge to the timely diagnosis of these infections. This study aims to use machine learning techniques to develop a diagnostic model that quickly identifies whether HIV-infected patients have any type of OIs, without being limited to specific infections, thus adapting to various clinical scenarios. This study is a retrospective cohort study that collected clinical data from HIV-infected patients at four healthcare organisations in China. A total of twelve machine learning classification algorithms were employed for the purposes of model training and evaluation. Additionally, feature reduction was conducted through the implementation of an importance ranking, with the objective of eliminating any redundant features. In conclusion, both the five features based on Shapley additive explanations (procalcitonin, haemoglobin, lymphocyte, creatinine, platelet) and the five features based on Permutation Importance explanations (procalcitonin, lymphocyte, haemoglobin, creatinine, indirect bilirubin) achieved the highest F1 score when evaluated using the adaptive boosting classifier model. The scores on the test set were 0.9016 and 0.9063, respectively, which significantly outperformed the best 32-feature model, gradient boosting classifier, which had a test set F1 score of 0.8991.
机会性感染(OIs)是人类免疫缺陷病毒感染(HIV感染)患者住院和死亡的主要原因。相关的病原体类型多样且临床表现复杂,给这些感染的及时诊断带来了巨大挑战。本研究旨在使用机器学习技术开发一种诊断模型,该模型能够快速识别HIV感染患者是否患有任何类型的机会性感染,而不限于特定感染,从而适应各种临床场景。本研究是一项回顾性队列研究,收集了中国四个医疗机构中HIV感染患者的临床数据。总共采用了十二种机器学习分类算法进行模型训练和评估。此外,通过实施重要性排序进行特征约简,目的是消除任何冗余特征。总之,基于夏普利加性解释的五个特征(降钙素原、血红蛋白、淋巴细胞、肌酐、血小板)和基于排列重要性解释的五个特征(降钙素原、淋巴细胞、血红蛋白、肌酐、间接胆红素)在使用自适应提升分类器模型进行评估时均获得了最高的F1分数。测试集上的分数分别为0.9016和0.9063,显著优于最佳的32特征模型梯度提升分类器,其测试集F1分数为0.8991。