Suppr超能文献

利用机器学习识别宫颈癌预防中数据驱动的临床亚组:基于人群的外部诊断验证研究

Identifying Data-Driven Clinical Subgroups for Cervical Cancer Prevention With Machine Learning: Population-Based, External, and Diagnostic Validation Study.

作者信息

Lu Zhen, Dong Binhua, Cai Hongning, Tian Tian, Wang Junfeng, Fu Leiwen, Wang Bingyi, Zhang Weijie, Lin Shaomei, Tuo Xunyuan, Wang Juntao, Yang Tianjie, Huang Xinxin, Zheng Zheng, Xue Huifeng, Xu Shuxia, Liu Siyang, Sun Pengming, Zou Huachun

机构信息

School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, China.

Department of Gynecology, Laboratory of Gynecologic Oncology, Fujian Maternity and Child Health Hospital, College of Clinical Medicine for Obstetrics & Gynecology and Pediatrics, Fujian Medical University, Fuzhou, China.

出版信息

JMIR Public Health Surveill. 2025 Mar 19;11:e67840. doi: 10.2196/67840.

Abstract

BACKGROUND

Cervical cancer remains a major global health issue. Personalized, data-driven cervical cancer prevention (CCP) strategies tailored to phenotypic profiles may improve prevention and reduce disease burden.

OBJECTIVE

This study aimed to identify subgroups with differential cervical precancer or cancer risks using machine learning, validate subgroup predictions across datasets, and propose a computational phenomapping strategy to enhance global CCP efforts.

METHODS

We explored the data-driven CCP subgroups by applying unsupervised machine learning to a deeply phenotyped, population-based discovery cohort. We extracted CCP-specific risks of cervical intraepithelial neoplasia (CIN) and cervical cancer through weighted logistic regression analyses providing odds ratio (OR) estimates and 95% CIs. We trained a supervised machine learning model and developed pathways to classify individuals before evaluating its diagnostic validity and usability on an external cohort.

RESULTS

This study included 551,934 women (median age, 49 years) in the discovery cohort and 47,130 women (median age, 37 years) in the external cohort. Phenotyping identified 5 CCP subgroups, with CCP4 showing the highest carcinoma prevalence. CCP2-4 had significantly higher risks of CIN2+ (CCP2: OR 2.07 [95% CI: 2.03-2.12], CCP3: 3.88 [3.78-3.97], and CCP4: 4.47 [4.33-4.63]) and CIN3+ (CCP2: 2.10 [2.05-2.14], CCP3: 3.92 [3.82-4.02], and CCP4: 4.45 [4.31-4.61]) compared to CCP1 (P<.001), consistent with the direction of results observed in the external cohort. The proposed triple strategy was validated as clinically relevant, prioritizing high-risk subgroups (CCP3-4) for colposcopies and scaling human papillomavirus screening for CCP1-2.

CONCLUSIONS

This study underscores the potential of leveraging machine learning algorithms and large-scale routine electronic health records to enhance CCP strategies. By identifying key determinants of CIN2+/CIN3+ risk and classifying 5 distinct subgroups, our study provides a robust, data-driven foundation for the proposed triple strategy. This approach prioritizes tailored prevention efforts for subgroups with varying risks, offering a novel and scalable tool to complement existing cervical cancer screening guidelines. Future work should focus on independent external and prospective validation to maximize the global impact of this strategy.

摘要

背景

宫颈癌仍然是一个重大的全球健康问题。针对表型特征量身定制的个性化、数据驱动的宫颈癌预防(CCP)策略可能会改善预防效果并减轻疾病负担。

目的

本研究旨在使用机器学习识别宫颈癌前病变或癌症风险存在差异的亚组,在多个数据集中验证亚组预测,并提出一种计算表型映射策略以加强全球CCP工作。

方法

我们通过将无监督机器学习应用于一个深度表型化的、基于人群的发现队列,探索数据驱动的CCP亚组。我们通过加权逻辑回归分析提取宫颈上皮内瘤变(CIN)和宫颈癌的CCP特异性风险,提供比值比(OR)估计值和95%置信区间。我们训练了一个监督机器学习模型,并开发了对个体进行分类的途径,然后在外部队列中评估其诊断有效性和实用性。

结果

本研究纳入了发现队列中的551,934名女性(中位年龄49岁)和外部队列中的47,130名女性(中位年龄37岁)。表型分析确定了5个CCP亚组,其中CCP4的癌症患病率最高。与CCP1相比,CCP2 - 4的CIN2 +(CCP2:OR 2.07 [95% CI:2.03 - 2.12],CCP3:3.88 [3.78 - 3.97],CCP4:4.47 [4.33 - 4.63])和CIN3 +(CCP2:2.10 [2.05 - 2.14],CCP3:3.92 [3.82 - 4.02],CCP4:4.45 [4.31 - 4.61])风险显著更高(P <.001),这与在外部队列中观察到的结果方向一致。所提出的三重策略被验证具有临床相关性,将高危亚组(CCP3 - 4)优先用于阴道镜检查,并为CCP1 - 2扩大人乳头瘤病毒筛查。

结论

本研究强调了利用机器学习算法和大规模常规电子健康记录来加强CCP策略的潜力。通过识别CIN2 + / CIN3 +风险的关键决定因素并分类5个不同的亚组,我们的研究为所提出的三重策略提供了一个强大的、数据驱动的基础。这种方法优先为具有不同风险的亚组开展量身定制的预防工作,提供了一种新颖且可扩展的工具来补充现有的宫颈癌筛查指南。未来的工作应侧重于独立的外部和前瞻性验证,以最大限度地扩大该策略的全球影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c5b/11939026/96e83c27d7ac/publichealth-v11-e67840-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验