基于常规血液分析的机器学习模型用于宫颈癌预测。

Cervical cancer prediction using machine learning models based on routine blood analysis.

作者信息

Su Jie, Lu Hui, Zhang Ruihuan, Cui Na, Chen Chao, Si Qin, Song Biao

机构信息

Medical neurobiology laboratory, Inner Mongolia Medical University, Huhhot, 010030, China.

College of Computer Science, Inner Mongolia University, Hohhot, 010021, China.

出版信息

Sci Rep. 2025 Jul 2;15(1):22655. doi: 10.1038/s41598-025-08166-0.

DOI:10.1038/s41598-025-08166-0

PMID:40594680

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12216743/

Abstract

Cervical cancer (CC) is the fourth most common cancer among women globally. The key to preventing and treating CC is early detection, diagnosis, and treatment. This study aimed to develop an interpretable model for predicting CC risk using routine blood data. The primary endpoint variable is the occurrence of CC, as confirmed by histopathological diagnosis. We used the Shapley Additive Explanation (SHAP) method to provide interpretabiligy and identify key factors associated with CC. In this restrospective study, medical records of patients from 2013 to 2023 were collected. A total of 2,503 patients diagnosed with CC were included in the case group, while the control group was composed of 3,794 patients without apparent signs of the disease, which included women with other gynecological conditions as well as healthy individuals undergoing routine check-ups. Age, clinical diagnosis information and 22 blood cell analysis results were considered. Four different algorithms were applied to construct a model for estimating the likelihood of CC occurrence. Using least absolute shrinkage and selection operator (LASSO) and the random forest method (RF) method, 15 key routine blood features were ultimtely selected from an initial set of 23 features for model training. These features include age, red blood cell count (RBC), platelet distribution width (PDW), white blood cell count (WBC), Lymphocyte Percentage (LYMPH%), basophil count (BASO), Basophil Percentage (BASO%), Lymphocyte Absolute Value (LYMPH), Neutrophil Percentage (NEUT%), Hemoglobin (HGB), Mean Corpuscular Hemoglobin Concentration (MCHC), Red Cell Distribution Width (R-CV), Mean Platelet Volume (MPV), Plateletcrit (PCT), and Among the four models, the extreme gradient boosting (XGBoost) model achieved the highest predictive performance, with an area under the curve (AUC) of 0.964. In contrast, the RF model exhibited the poorest generalization ability, with an AUC of 0.907. The SHAP method revealed the top 6 predictors of CC according to the importance ranking, and the average platelet distribution width (PDW) was recognized as the most important predictor variable for CC occurrence (the primary endpoint variable).

摘要

宫颈癌（CC）是全球女性中第四大常见癌症。预防和治疗宫颈癌的关键在于早期检测、诊断和治疗。本研究旨在开发一种可解释的模型，用于利用常规血液数据预测宫颈癌风险。主要终点变量是经组织病理学诊断确诊的宫颈癌的发生情况。我们使用夏普利值附加解释（SHAP）方法来提供可解释性，并识别与宫颈癌相关的关键因素。在这项回顾性研究中，收集了2013年至2023年患者的病历。病例组共纳入2503例诊断为宫颈癌的患者，而对照组由3794例无明显疾病迹象的患者组成，其中包括患有其他妇科疾病的女性以及接受常规检查的健康个体。考虑了年龄、临床诊断信息和22项血细胞分析结果。应用四种不同的算法构建了一个模型，用于估计宫颈癌发生的可能性。使用最小绝对收缩和选择算子（LASSO）和随机森林方法（RF），最终从最初的23个特征集中选择了15个关键的常规血液特征用于模型训练。这些特征包括年龄、红细胞计数（RBC）、血小板分布宽度（PDW）、白细胞计数（WBC）、淋巴细胞百分比（LYMPH%）、嗜碱性粒细胞计数（BASO）、嗜碱性粒细胞百分比（BASO%）、淋巴细胞绝对值（LYMPH）、中性粒细胞百分比（NEUT%）、血红蛋白（HGB）、平均红细胞血红蛋白浓度（MCHC）、红细胞分布宽度（R-CV）、平均血小板体积（MPV）、血小板压积（PCT）。在这四个模型中，极端梯度提升（XGBoost）模型实现了最高的预测性能，曲线下面积（AUC）为0.964。相比之下，RF模型的泛化能力最差，AUC为0.907。SHAP方法根据重要性排名揭示了宫颈癌的前6个预测因子，平均血小板分布宽度（PDW）被认为是宫颈癌发生（主要终点变量）最重要的预测变量。

相似文献

Cervical cancer prediction using machine learning models based on routine blood analysis.基于常规血液分析的机器学习模型用于宫颈癌预测。

Sci Rep. 2025 Jul 2;15(1):22655. doi: 10.1038/s41598-025-08166-0.

Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型：基于多中心队列研究的开发与验证研究

J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.

Interpretable machine learning for predicting isolated basal septal hypertrophy.用于预测孤立性基底间隔肥厚的可解释机器学习。

PLoS One. 2025 Jun 30;20(6):e0325992. doi: 10.1371/journal.pone.0325992. eCollection 2025.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Interpretable XGBoost model identifies idiopathic central precocious puberty in girls using four clinical and imaging features.可解释的XGBoost模型利用四种临床和影像学特征识别女童特发性中枢性性早熟。

BMC Endocr Disord. 2025 Jul 1;25(1):159. doi: 10.1186/s12902-025-01983-4.

Construction and validation of HBV-ACLF bacterial infection diagnosis model based on machine learning.基于机器学习的HBV-ACLF细菌感染诊断模型的构建与验证

BMC Infect Dis. 2025 Jul 1;25(1):847. doi: 10.1186/s12879-025-11199-5.

Development of machine learning model for predicting prolonged operation time in lumbar stenosis undergoing posterior lumbar interbody fusion: a multicenter study.用于预测接受后路腰椎椎间融合术的腰椎管狭窄症患者手术时间延长的机器学习模型的开发：一项多中心研究。

Spine J. 2025 Mar;25(3):460-473. doi: 10.1016/j.spinee.2024.10.001. Epub 2024 Oct 19.

Predicting Early-Onset Colorectal Cancer in Individuals Below Screening Age Using Machine Learning and Real-World Data: Case Control Study.利用机器学习和真实世界数据预测筛查年龄以下个体的早发性结直肠癌：病例对照研究

JMIR Cancer. 2025 Jun 19;11:e64506. doi: 10.2196/64506.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Serum calcium-based interpretable machine learning model for predicting anastomotic leakage after rectal cancer resection: A multi-center study.基于血清钙的直肠癌切除术后吻合口漏预测可解释机器学习模型：一项多中心研究

World J Gastroenterol. 2025 May 21;31(19):105283. doi: 10.3748/wjg.v31.i19.105283.

本文引用的文献

The role of platelets in cancer: from their influence on tumor progression to their potential use in liquid biopsy.血小板在癌症中的作用：从其对肿瘤进展的影响到在液体活检中的潜在应用。

Biomark Res. 2025 Feb 11;13(1):27. doi: 10.1186/s40364-025-00742-w.

Development, validation, and clinical application of a machine learning model for risk stratification and management of cervical cancer screening based on full-genotyping hrHPV test (SMART-HPV): a modelling study.基于全基因分型高危型人乳头瘤病毒检测（SMART-HPV）的宫颈癌筛查风险分层与管理机器学习模型的开发、验证及临床应用：一项建模研究

Lancet Reg Health West Pac. 2025 Jan 25;55:101480. doi: 10.1016/j.lanwpc.2025.101480. eCollection 2025 Feb.

Maternal, delivery and neonatal outcomes in women with cervical cancer. A study of a population database.宫颈癌女性的孕产妇、分娩及新生儿结局。一项基于人群数据库的研究。

Oncoscience. 2025 Jan 20;12:3-12. doi: 10.18632/oncoscience.613. eCollection 2025.

A precise machine learning model: Detecting cervical cancer using feature selection and explainable AI.一种精确的机器学习模型：利用特征选择和可解释人工智能检测宫颈癌。

J Pathol Inform. 2024 Sep 26;15:100398. doi: 10.1016/j.jpi.2024.100398. eCollection 2024 Dec.

Blood cell indices and inflammation-related markers with kidney cancer risk: a large-population prospective analysis in UK Biobank.血细胞指数和炎症相关标志物与肾癌风险：英国生物银行的一项大样本前瞻性分析

Front Oncol. 2024 May 23;14:1366449. doi: 10.3389/fonc.2024.1366449. eCollection 2024.

Interpreting artificial intelligence models: a systematic review on the application of LIME and SHAP in Alzheimer's disease detection.解读人工智能模型：关于局部可解释模型无关性解释（LIME）和SHapley值解释（SHAP）在阿尔茨海默病检测中应用的系统综述

Brain Inform. 2024 Apr 5;11(1):10. doi: 10.1186/s40708-024-00222-1.

The dynamic role of platelets in cancer progression and their therapeutic implications.血小板在癌症进展中的动态作用及其治疗意义。

Nat Rev Cancer. 2024 Jan;24(1):72-87. doi: 10.1038/s41568-023-00639-6. Epub 2023 Dec 1.

PREDICTIVE VALUE OF PLATELET COUNT AND PLATELET INDICES IN CERVICAL CANCER PATIENTS WITH EXTERNAL RADIATION THERAPY.血小板计数和血小板指数在外照射治疗宫颈癌患者中的预测价值。

Wiad Lek. 2023;76(10):2269-2276. doi: 10.36740/WLek202310121.

Recent advancements in machine learning and deep learning-based breast cancer detection using mammograms.基于机器学习和深度学习的乳腺 X 线摄影乳腺癌检测的最新进展。

Phys Med. 2023 Oct;114:103138. doi: 10.1016/j.ejmp.2023.103138. Epub 2023 Sep 28.

Artificial Intelligence in Head and Neck Cancer: A Systematic Review of Systematic Reviews.人工智能在头颈部肿瘤中的应用：系统评价的系统评价。

Adv Ther. 2023 Aug;40(8):3360-3380. doi: 10.1007/s12325-023-02527-9. Epub 2023 Jun 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于常规血液分析的机器学习模型用于宫颈癌预测。

Cervical cancer prediction using machine learning models based on routine blood analysis.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献