Department of General Practice, First Affiliated Hospital, Zhejiang University School of Medicine, 310003, Hangzhou, China.
Clinical Research Institute, Zhejiang Provincial People's Hospital (Affiliated People's Hospital of Hangzhou Medical College), Hangzhou, China.
BMC Med Inform Decis Mak. 2024 Jan 24;24(1):24. doi: 10.1186/s12911-024-02426-1.
BACKGROUND AND AIMS: Sexually transmitted infections (STIs) are a significant global public health challenge due to their high incidence rate and potential for severe consequences when early intervention is neglected. Research shows an upward trend in absolute cases and DALY numbers of STIs, with syphilis, chlamydia, trichomoniasis, and genital herpes exhibiting an increasing trend in age-standardized rate (ASR) from 2010 to 2019. Machine learning (ML) presents significant advantages in disease prediction, with several studies exploring its potential for STI prediction. The objective of this study is to build males-based and females-based STI risk prediction models based on the CatBoost algorithm using data from the National Health and Nutrition Examination Survey (NHANES) for training and validation, with sub-group analysis performed on each STI. The female sub-group also includes human papilloma virus (HPV) infection. METHODS: The study utilized data from the National Health and Nutrition Examination Survey (NHANES) program to build males-based and females-based STI risk prediction models using the CatBoost algorithm. Data was collected from 12,053 participants aged 18 to 59 years old, with general demographic characteristics and sexual behavior questionnaire responses included as features. The Adaptive Synthetic Sampling Approach (ADASYN) algorithm was used to address data imbalance, and 15 machine learning algorithms were evaluated before ultimately selecting the CatBoost algorithm. The SHAP method was employed to enhance interpretability by identifying feature importance in the model's STIs risk prediction. RESULTS: The CatBoost classifier achieved AUC values of 0.9995, 0.9948, 0.9923, and 0.9996 and 0.9769 for predicting chlamydia, genital herpes, genital warts, gonorrhea, and overall STIs infections among males. The CatBoost classifier achieved AUC values of 0.9971, 0.972, 0.9765, 1, 0.9485 and 0.8819 for predicting chlamydia, genital herpes, genital warts, gonorrhea, HPV and overall STIs infections among females. The characteristics of having sex with new partner/year, times having sex without condom/year, and the number of female vaginal sex partners/lifetime have been identified as the top three significant predictors for the overall risk of male STIs. Similarly, ever having anal sex with a man, age and the number of male vaginal sex partners/lifetime have been identified as the top three significant predictors for the overall risk of female STIs. CONCLUSIONS: This study demonstrated the effectiveness of the CatBoost classifier in predicting STI risks among both male and female populations. The SHAP algorithm revealed key predictors for each infection, highlighting consistent demographic characteristics and sexual behaviors across different STIs. These insights can guide targeted prevention strategies and interventions to alleviate the impact of STIs on public health.
背景与目的:性传播感染(STIs)是一个重大的全球公共卫生挑战,因为它们的发病率高,如果早期干预被忽视,可能会产生严重后果。研究表明,STIs 的绝对病例数和残疾调整生命年(DALY)数呈上升趋势,梅毒、衣原体、滴虫病和生殖器疱疹的年龄标准化率(ASR)从 2010 年到 2019 年呈上升趋势。机器学习(ML)在疾病预测方面具有显著优势,已有多项研究探索了其在 STI 预测方面的潜力。本研究旨在基于 CatBoost 算法,利用来自全国健康与营养调查(NHANES)的数据,分别为男性和女性构建 STI 风险预测模型,并对每种 STI 进行亚组分析。女性亚组还包括人乳头瘤病毒(HPV)感染。 方法:本研究利用来自全国健康与营养调查(NHANES)计划的数据,使用 CatBoost 算法为男性和女性构建了 STI 风险预测模型。数据来自 12053 名年龄在 18 至 59 岁的参与者,包括一般人口统计学特征和性行为问卷回答作为特征。采用自适应综合抽样方法(ADASYN)算法解决数据不平衡问题,在最终选择 CatBoost 算法之前,评估了 15 种机器学习算法。使用 SHAP 方法通过识别模型的 STIs 风险预测中的特征重要性来提高可解释性。 结果:CatBoost 分类器在预测男性中的衣原体、生殖器疱疹、生殖器疣、淋病和总体 STIs 感染方面的 AUC 值分别为 0.9995、0.9948、0.9923 和 0.9996 和 0.9769。CatBoost 分类器在预测女性中的衣原体、生殖器疱疹、生殖器疣、淋病、HPV 和总体 STIs 感染方面的 AUC 值分别为 0.9971、0.972、0.9765、1、0.9485 和 0.8819。有与新伴侣发生性行为、每年无保护性行为次数和女性阴道性伴侣数量/终生被确定为男性 STIs 总体风险的前三个重要预测因素。同样,与男性发生肛交、年龄和男性阴道性伴侣数量/终生被确定为女性 STIs 总体风险的前三个重要预测因素。 结论:本研究表明 CatBoost 分类器在预测男性和女性人群的 STI 风险方面具有有效性。SHAP 算法揭示了每种感染的关键预测因素,突出了不同 STIs 之间一致的人口统计学特征和性行为。这些见解可以指导有针对性的预防策略和干预措施,以减轻 STIs 对公共卫生的影响。
BMC Med Inform Decis Mak. 2024-1-24
Cochrane Database Syst Rev. 2004
Cochrane Database Syst Rev. 2001
Cochrane Database Syst Rev. 2011-3-16
Cochrane Database Syst Rev. 2014-7-29
Cochrane Database Syst Rev. 2013-10-3
Cochrane Database Syst Rev. 2013-10-26
2025-1
Front Reprod Health. 2022-12-22