文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

利用 CatBoost 算法和 NHANES 数据构建基于性别的性传播感染风险预测模型。

Building gender-specific sexually transmitted infection risk prediction models using CatBoost algorithm and NHANES data.

机构信息

Department of General Practice, First Affiliated Hospital, Zhejiang University School of Medicine, 310003, Hangzhou, China.

Clinical Research Institute, Zhejiang Provincial People's Hospital (Affiliated People's Hospital of Hangzhou Medical College), Hangzhou, China.

出版信息

BMC Med Inform Decis Mak. 2024 Jan 24;24(1):24. doi: 10.1186/s12911-024-02426-1.


DOI:10.1186/s12911-024-02426-1
PMID:38267946
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10809625/
Abstract

BACKGROUND AND AIMS: Sexually transmitted infections (STIs) are a significant global public health challenge due to their high incidence rate and potential for severe consequences when early intervention is neglected. Research shows an upward trend in absolute cases and DALY numbers of STIs, with syphilis, chlamydia, trichomoniasis, and genital herpes exhibiting an increasing trend in age-standardized rate (ASR) from 2010 to 2019. Machine learning (ML) presents significant advantages in disease prediction, with several studies exploring its potential for STI prediction. The objective of this study is to build males-based and females-based STI risk prediction models based on the CatBoost algorithm using data from the National Health and Nutrition Examination Survey (NHANES) for training and validation, with sub-group analysis performed on each STI. The female sub-group also includes human papilloma virus (HPV) infection. METHODS: The study utilized data from the National Health and Nutrition Examination Survey (NHANES) program to build males-based and females-based STI risk prediction models using the CatBoost algorithm. Data was collected from 12,053 participants aged 18 to 59 years old, with general demographic characteristics and sexual behavior questionnaire responses included as features. The Adaptive Synthetic Sampling Approach (ADASYN) algorithm was used to address data imbalance, and 15 machine learning algorithms were evaluated before ultimately selecting the CatBoost algorithm. The SHAP method was employed to enhance interpretability by identifying feature importance in the model's STIs risk prediction. RESULTS: The CatBoost classifier achieved AUC values of 0.9995, 0.9948, 0.9923, and 0.9996 and 0.9769 for predicting chlamydia, genital herpes, genital warts, gonorrhea, and overall STIs infections among males. The CatBoost classifier achieved AUC values of 0.9971, 0.972, 0.9765, 1, 0.9485 and 0.8819 for predicting chlamydia, genital herpes, genital warts, gonorrhea, HPV and overall STIs infections among females. The characteristics of having sex with new partner/year, times having sex without condom/year, and the number of female vaginal sex partners/lifetime have been identified as the top three significant predictors for the overall risk of male STIs. Similarly, ever having anal sex with a man, age and the number of male vaginal sex partners/lifetime have been identified as the top three significant predictors for the overall risk of female STIs. CONCLUSIONS: This study demonstrated the effectiveness of the CatBoost classifier in predicting STI risks among both male and female populations. The SHAP algorithm revealed key predictors for each infection, highlighting consistent demographic characteristics and sexual behaviors across different STIs. These insights can guide targeted prevention strategies and interventions to alleviate the impact of STIs on public health.

摘要

背景与目的:性传播感染(STIs)是一个重大的全球公共卫生挑战,因为它们的发病率高,如果早期干预被忽视,可能会产生严重后果。研究表明,STIs 的绝对病例数和残疾调整生命年(DALY)数呈上升趋势,梅毒、衣原体、滴虫病和生殖器疱疹的年龄标准化率(ASR)从 2010 年到 2019 年呈上升趋势。机器学习(ML)在疾病预测方面具有显著优势,已有多项研究探索了其在 STI 预测方面的潜力。本研究旨在基于 CatBoost 算法,利用来自全国健康与营养调查(NHANES)的数据,分别为男性和女性构建 STI 风险预测模型,并对每种 STI 进行亚组分析。女性亚组还包括人乳头瘤病毒(HPV)感染。 方法:本研究利用来自全国健康与营养调查(NHANES)计划的数据,使用 CatBoost 算法为男性和女性构建了 STI 风险预测模型。数据来自 12053 名年龄在 18 至 59 岁的参与者,包括一般人口统计学特征和性行为问卷回答作为特征。采用自适应综合抽样方法(ADASYN)算法解决数据不平衡问题,在最终选择 CatBoost 算法之前,评估了 15 种机器学习算法。使用 SHAP 方法通过识别模型的 STIs 风险预测中的特征重要性来提高可解释性。 结果:CatBoost 分类器在预测男性中的衣原体、生殖器疱疹、生殖器疣、淋病和总体 STIs 感染方面的 AUC 值分别为 0.9995、0.9948、0.9923 和 0.9996 和 0.9769。CatBoost 分类器在预测女性中的衣原体、生殖器疱疹、生殖器疣、淋病、HPV 和总体 STIs 感染方面的 AUC 值分别为 0.9971、0.972、0.9765、1、0.9485 和 0.8819。有与新伴侣发生性行为、每年无保护性行为次数和女性阴道性伴侣数量/终生被确定为男性 STIs 总体风险的前三个重要预测因素。同样,与男性发生肛交、年龄和男性阴道性伴侣数量/终生被确定为女性 STIs 总体风险的前三个重要预测因素。 结论:本研究表明 CatBoost 分类器在预测男性和女性人群的 STI 风险方面具有有效性。SHAP 算法揭示了每种感染的关键预测因素,突出了不同 STIs 之间一致的人口统计学特征和性行为。这些见解可以指导有针对性的预防策略和干预措施,以减轻 STIs 对公共卫生的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4efa/10809625/093018d077ba/12911_2024_2426_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4efa/10809625/7b23d49205cd/12911_2024_2426_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4efa/10809625/093018d077ba/12911_2024_2426_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4efa/10809625/7b23d49205cd/12911_2024_2426_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4efa/10809625/093018d077ba/12911_2024_2426_Fig2_HTML.jpg

相似文献

[1]
Building gender-specific sexually transmitted infection risk prediction models using CatBoost algorithm and NHANES data.

BMC Med Inform Decis Mak. 2024-1-24

[2]
Population-based interventions for reducing sexually transmitted infections, including HIV infection.

Cochrane Database Syst Rev. 2004

[3]
Population-based interventions for reducing sexually transmitted infections, including HIV infection.

Cochrane Database Syst Rev. 2001

[4]
Population-based biomedical sexually transmitted infection control interventions for reducing HIV infection.

Cochrane Database Syst Rev. 2011-3-16

[5]
Prescription of Controlled Substances: Benefits and Risks

2025-1

[6]
Structural and community-level interventions for increasing condom use to prevent the transmission of HIV and other sexually transmitted infections.

Cochrane Database Syst Rev. 2014-7-29

[7]
Strategies for partner notification for sexually transmitted infections, including HIV.

Cochrane Database Syst Rev. 2013-10-3

[8]
Behavioral interventions for improving condom use for dual protection.

Cochrane Database Syst Rev. 2013-10-26

[9]
Ophthalmia Neonatorum

2025-1

[10]
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024-12-1

引用本文的文献

[1]
Exploring the effect of the triglyceride-glucose index on bone metabolism in prepubertal children, a retrospective study: insights from traditional methods and machine-learning-based bone remodeling prediction.

PeerJ. 2025-5-20

[2]
Machine learning for personalized risk assessment of HIV, syphilis, gonorrhoea and chlamydia: A systematic review and meta-analysis.

Int J Infect Dis. 2025-8

[3]
Identification and Validation of an Explainable Prediction Model of Sepsis in Patients With Intracerebral Hemorrhage: Multicenter Retrospective Study.

J Med Internet Res. 2025-4-28

本文引用的文献

[1]
Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting.

J Biomol Struct Dyn. 2024

[2]
A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare.

BioData Min. 2023-4-25

[3]
Determinants and prediction of re-testing and re-infection within 1 year among heterosexuals with chlamydia attending a sexual health clinic.

Front Public Health. 2022

[4]
The role of machine learning in HIV risk prediction.

Front Reprod Health. 2022-12-22

[5]
Unsupervised machine learning predicts future sexual behaviour and sexually transmitted infections among HIV-positive men who have sex with men.

PLoS Comput Biol. 2022-10

[6]
Global, regional, and national burdens of HIV and other sexually transmitted infections in adolescents and young adults aged 10-24 years from 1990 to 2019: a trend analysis based on the Global Burden of Disease Study 2019.

Lancet Child Adolesc Health. 2022-11

[7]
Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation.

Front Public Health. 2022

[8]
Web-Based Risk Prediction Tool for an Individual's Risk of HIV and Sexually Transmitted Infections Using Machine Learning Algorithms: Development and External Validation Study.

J Med Internet Res. 2022-8-25

[9]
Increasing incidence rates of sexually transmitted infections from 2010 to 2019: an analysis of temporal trends by geographical regions and age groups from the 2019 Global Burden of Disease Study.

BMC Infect Dis. 2022-6-26

[10]
A Machine-Learning-Based Risk-Prediction Tool for HIV and Sexually Transmitted Infections Acquisition over the Next 12 Months.

J Clin Med. 2022-3-25

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索