China Australia Joint Research Centre for Infectious Diseases, School of Public Health, Xi'an Jiaotong University Health Science Centre, Xi'an, Shaanxi 710061, People's Republic of China; Department of Epidemiology and Biostatistics, School of Public Health, Nantong University, No.9 Seyuan Road, Chongchuan District, Nantong, Jiangsu 226019, People's Republic of China; Melbourne Sexual Health Centre, Alfred Health, Melbourne, Australia.
Melbourne Sexual Health Centre, Alfred Health, Melbourne, Australia; Central Clinical School, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, VIC, Australia; The Kirby Institute, University of NSW, Sydney, Australia.
J Infect. 2021 Jan;82(1):48-59. doi: 10.1016/j.jinf.2020.11.007. Epub 2020 Nov 12.
OBJECTIVES: We aimed to develop machine learning models and evaluate their performance in predicting HIV and sexually transmitted infections (STIs) diagnosis based on a cohort of Australian men who have sex with men (MSM). METHODS: We collected clinical records of 21,273 Australian MSM during 2011-2017. We compared accuracies for predicting HIV and STIs (syphilis, gonorrhoea, chlamydia) diagnosis using four machine learning approaches against a multivariable logistic regression (MLR) model. RESULTS: Machine learning approaches consistently outperformed MLR. Gradient boosting machine (GBM) achieved the highest area under the receiver operator characteristic curve for HIV (76.3%) and STIs (syphilis, 85.8%; gonorrhoea, 75.5%; chlamydia, 68.0%), followed by extreme gradient boosting (71.1%, 82.2%, 70.3%, 66.4%), random forest (72.0%, 81.9%, 67.2%, 64.3%), deep learning (75.8%, 81.0%, 67.5%, 65.4%) and MLR (69.8%, 80.1%, 67.2%, 63.2%). GBM models demonstrated the ten greatest predictors collectively explained 62.7-73.6% of variations in predicting HIV/STIs. STIs symptoms, past syphilis infection, age, time living in Australia, frequency of condom use with casual male sexual partners during receptive anal sex and the number of casual male sexual partners in the past 12 months were most commonly identified predictors. CONCLUSIONS: Machine learning approaches are advantageous over multivariable logistic regression models in predicting HIV/STIs diagnosis.
目的:我们旨在开发机器学习模型,并基于澳大利亚男男性行为者(MSM)队列评估其预测 HIV 和性传播感染(STI)诊断的性能。
方法:我们收集了 2011 年至 2017 年期间 21273 名澳大利亚 MSM 的临床记录。我们比较了四种机器学习方法与多变量逻辑回归(MLR)模型预测 HIV 和 STI(梅毒、淋病、衣原体)诊断的准确性。
结果:机器学习方法始终优于 MLR。梯度提升机(GBM)在 HIV(76.3%)和 STI(梅毒,85.8%;淋病,75.5%;衣原体,68.0%)的接收器操作特征曲线下面积方面取得了最高得分,其次是极端梯度提升(71.1%,82.2%,70.3%,66.4%),随机森林(72.0%,81.9%,67.2%,64.3%),深度学习(75.8%,81.0%,67.5%,65.4%)和 MLR(69.8%,80.1%,67.2%,63.2%)。GBM 模型共同表明,前 10 大预测因子共同解释了预测 HIV/STI 变化的 62.7-73.6%。STI 症状、既往梅毒感染、年龄、在澳大利亚的居住时间、在接受肛交时与偶然男性性伴侣使用安全套的频率以及过去 12 个月内偶然男性性伴侣的数量是最常见的预测因子。
结论:机器学习方法在预测 HIV/STI 诊断方面优于多变量逻辑回归模型。
J Adolesc Health. 2014-3-21
Pathogens. 2025-7-30
PLOS Digit Health. 2025-7-23
BMJ Health Care Inform. 2025-5-15
Front Cell Infect Microbiol. 2025-5-1
BMC Med Inform Decis Mak. 2025-3-17