Xu Fumin, Chen Xiao, Yin Xinru, Qiu Qiu, Xiao Jingjing, Qiao Liang, He Mi, Tang Liang, Li Xiawei, Zhang Qiao, Lv Yanling, Xiao Shili, Zhao Rong, Guo Yan, Chen Mingsheng, Chen Dongfeng, Wen Liangzhi, Wang Bin, Nian Yongjian, Liu Kaijun
Department of Gastroenterology, Daping Hospital, Army Medical University, Chongqing, People's Republic of China.
Department of Nuclear Medicine, Daping Hospital, Army Medical University, Chongqing, People's Republic of China.
Int J Gen Med. 2021 Apr 29;14:1589-1598. doi: 10.2147/IJGM.S294872. eCollection 2021.
Since December 2019, COVID-19 has spread throughout the world. Clinical outcomes of COVID-19 patients vary among infected individuals. Therefore, it is vital to identify patients at high risk of disease progression.
In this retrospective, multicenter cohort study, COVID-19 patients from Huoshenshan Hospital and Taikang Tongji Hospital (Wuhan, China) were included. Clinical features showing significant differences between the severe and nonsevere groups were screened out by univariate analysis. Then, these features were used to generate classifier models to predict whether a COVID-19 case would be severe or nonsevere based on machine learning. Two test sets of data from the two hospitals were gathered to evaluate the predictive performance of the models.
A total of 455 patients were included, and 21 features showing significant differences between the severe and nonsevere groups were selected for the training and validation set. The optimal subset, with eleven features in the -nearest neighbor model, obtained the highest area under the curve (AUC) value among the four models in the validation set. D-dimer, CRP, and age were the three most important features in the optimal-feature subsets. The highest AUC value was obtained using a support vector-machine model for a test set from Huoshenshan Hospital. Software for predicting disease progression based on machine learning was developed.
The predictive models were successfully established based on machine learning, and achieved satisfactory predictive performance of disease progression with optimal-feature subsets.
自2019年12月以来,新型冠状病毒肺炎(COVID-19)已在全球蔓延。COVID-19患者的临床结局在感染个体之间存在差异。因此,识别疾病进展高危患者至关重要。
在这项回顾性多中心队列研究中,纳入了来自火神山医院和泰康同济医院(中国武汉)的COVID-19患者。通过单因素分析筛选出重症组和非重症组之间存在显著差异的临床特征。然后,利用这些特征生成分类器模型,基于机器学习预测COVID-19病例是重症还是非重症。收集来自两家医院的两组测试数据以评估模型的预测性能。
共纳入455例患者,选择重症组和非重症组之间存在显著差异的21个特征用于训练和验证集。在验证集中,-最近邻模型中的最优子集(包含11个特征)在四个模型中获得了最高的曲线下面积(AUC)值。D-二聚体、C反应蛋白(CRP)和年龄是最优特征子集中最重要的三个特征。使用支持向量机模型对来自火神山医院的测试集获得了最高的AUC值。开发了基于机器学习的疾病进展预测软件。
基于机器学习成功建立了预测模型,并通过最优特征子集在疾病进展预测性能方面取得了令人满意的结果。