Center for Research on Infectious Diseases, Instituto Nacional de Salud Pública, Cuernavaca, 62100, Mexico.
Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mexico City, 04510, Mexico.
BMC Infect Dis. 2023 Jan 11;23(1):18. doi: 10.1186/s12879-022-07951-w.
Mexico ranks fifth worldwide in the number of deaths due to COVID-19. Identifying risk markers through easily accessible clinical data could help in the initial triage of COVID-19 patients and anticipate a fatal outcome, especially in the most socioeconomically disadvantaged regions. This study aims to identify markers that increase lethality risk in patients diagnosed with COVID-19, based on machine learning (ML) methods. Markers were differentiated by sex and age-group.
A total of 11,564 cases of COVID-19 in Mexico were extracted from the Epidemiological Surveillance System for Viral Respiratory Disease. Four ML classification methods were trained to predict lethality, and an interpretability approach was used to identify those markers.
Models based on Extreme Gradient Boosting (XGBoost) yielded the best performance in a test set. This model achieved a sensitivity of 0.91, a specificity of 0.69, a positive predictive value of 0.344, and a negative predictive value of 0.965. For female patients, the leading markers are diabetes and arthralgia. For males, the main markers are chronic kidney disease (CKD) and chest pain. Dyspnea, hypertension, and polypnea increased the risk of death in both sexes.
ML-based models using an interpretability approach successfully identified risk markers for lethality by sex and age. Our results indicate that age is the strongest demographic factor for a fatal outcome, while all other markers were consistent with previous clinical trials conducted in a Mexican population. The markers identified here could be used as an initial triage, especially in geographic areas with limited resources.
墨西哥是全球因 COVID-19 死亡人数排名第五的国家。通过易于获取的临床数据识别风险标志物,有助于对 COVID-19 患者进行初步分诊,并预测致命结局,尤其是在社会经济最落后的地区。本研究旨在基于机器学习 (ML) 方法,确定与 COVID-19 患者病死率相关的标志物。这些标志物根据性别和年龄组进行了区分。
从病毒性呼吸道疾病流行病学监测系统中提取了墨西哥共 11564 例 COVID-19 病例。采用四种 ML 分类方法对病死率进行预测,并采用可解释性方法识别这些标志物。
基于极端梯度提升 (XGBoost) 的模型在测试集中表现最佳。该模型的灵敏度为 0.91,特异性为 0.69,阳性预测值为 0.344,阴性预测值为 0.965。对于女性患者,主要标志物是糖尿病和关节痛。对于男性患者,主要标志物是慢性肾脏病 (CKD) 和胸痛。呼吸困难、高血压和呼吸急促增加了两性患者的死亡风险。
使用可解释性方法的基于 ML 的模型成功地按性别和年龄识别了病死率的风险标志物。我们的结果表明,年龄是导致死亡的最强人口统计学因素,而所有其他标志物都与之前在墨西哥人群中进行的临床试验一致。这里确定的标志物可用于初步分诊,尤其是在资源有限的地理区域。