Guo QingYong, Wang Jinji, Chen Ru, Hu LiPing, You Wenqiang
Obstetrics & Gynecology, Fujian Maternity and Child Health Hospital College of Clinical Medicine for Obstetrics & Gynecology and Pediatrics, Fujian Medical University, Fuzhou, Fujian, China.
Medical Record Statistics, Fujian Maternity and Child Health Hospital College of Clinical Medicine for Medical Record Statistics, Fujian Medical University, Fuzhou, Fujian, China.
Front Oncol. 2025 Jul 2;15:1527674. doi: 10.3389/fonc.2025.1527674. eCollection 2025.
Ovarian cancer (OC) remains a highly lethal gynecological malignancy, often diagnosed at advanced stages with a poor prognosis. Lymph node involvement is a critical prognostic factor and significantly influences treatment planning. However, accurately predicting lymph node positivity remains challenging due to the disease's heterogeneity and the limitations of traditional models in handling high-dimensional and imbalanced data.
A retrospective analysis was conducted using the SEER database (2000-2021), including 26,844 OC patients with complete clinical information. We developed a machine learning model incorporating multiple algorithms, with XGBoost demonstrating superior performance. SMOTE was used to address class imbalance, and LASSO regression aided in selecting key predictors such as tumor size, histology, chemotherapy, and surgery. Model performance was assessed via accuracy, sensitivity, specificity, F1 score, and AUC, with external validation performed using an independent cohort from Fujian Provincial Maternity and Children's Hospital.
The XGBoost model achieved an AUC of 0.98 (95% CI: 0.975-0.986) in the training set and 0.847 (95% CI: 0.823-0.871) in external validation. The model demonstrated high sensitivity and robust performance in identifying lymph node-positive cases. Tumor size ≥5 cm, histological subtype, and chemotherapy were key predictive features, with SHAP analysis identifying tumor size as the most influential factor.
We present the first machine learning model specifically developed for predicting lymph node positivity in OC, validated across large, diverse cohorts. To facilitate clinical translation, we developed a free, user-friendly online calculator, which allows clinicians to quickly estimate lymph node positivity risk using patient-specific clinical parameters. This tool can be accessed at http://127.0.0.1:6818 and serves as a practical, evidence-based aid to support individualized treatment decisions and potentially improve patient outcomes. Future studies should integrate molecular data and broaden external validation to enhance generalizability.
卵巢癌(OC)仍然是一种高度致命的妇科恶性肿瘤,通常在晚期被诊断出来,预后较差。淋巴结受累是一个关键的预后因素,对治疗方案的制定有重大影响。然而,由于该疾病的异质性以及传统模型在处理高维和不平衡数据方面的局限性,准确预测淋巴结阳性仍然具有挑战性。
使用SEER数据库(2000 - 2021年)进行回顾性分析,纳入26,844例具有完整临床信息的OC患者。我们开发了一种结合多种算法的机器学习模型,其中XGBoost表现出卓越的性能。使用SMOTE处理类别不平衡问题,LASSO回归辅助选择关键预测因素,如肿瘤大小、组织学类型、化疗和手术。通过准确性、敏感性、特异性、F1分数和AUC评估模型性能,并使用福建省妇幼保健院的独立队列进行外部验证。
XGBoost模型在训练集中的AUC为0.98(95%CI:0.975 - 0.986),在外部验证中的AUC为0.847(95%CI:0.823 - 0.871)。该模型在识别淋巴结阳性病例方面表现出高敏感性和稳健性能。肿瘤大小≥5 cm、组织学亚型和化疗是关键预测特征,SHAP分析确定肿瘤大小为最具影响力的因素。
我们展示了首个专门为预测OC患者淋巴结阳性而开发的机器学习模型,并在大型、多样化队列中进行了验证。为促进临床转化,我们开发了一个免费、用户友好的在线计算器,临床医生可以使用患者特定的临床参数快速估计淋巴结阳性风险。该工具可在http://127.0.0.1:6818访问,是支持个体化治疗决策并可能改善患者预后的实用、循证辅助工具。未来研究应整合分子数据并扩大外部验证以提高通用性。