Wu Meixuan, Zhao Yaqian, Dong Xuhui, Jin Yue, Cheng Shanshan, Zhang Nan, Xu Shilin, Gu Sijia, Wu Yongsong, Yang Jiani, Yao Liangqing, Wang Yu
Department of Obstetrics and Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China.
Department of Obstetrics and Gynecology, Renji Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China.
Front Oncol. 2022 Sep 21;12:975703. doi: 10.3389/fonc.2022.975703. eCollection 2022.
Ovarian cancer (OC) is the most lethal gynecological malignancy, with limited early screening methods and poor prognosis. Artificial intelligence technology has made a great breakthrough in cancer diagnosis.
We aim to develop a specific interpretable machine learning (ML) prediction model for the diagnosis and prognosis of epithelial ovarian cancer (EOC) based on a variety of biomarkers.
A total of 521 patients with EOC and 144 patients with benign gynecological diseases were enrolled including derivation datasets and an external validation cohort. The predicted information was acquired by 9 supervised ML methods, through 34 parameters. Behind predicted reasons for the best ML were improved by using the SHapley Additive exPlanations (SHAP) algorithm. In addition, the prognosis of EOC was analyzed by unsupervised clustering and Kaplan-Meier (KM) survival analysis.
ML technology was superior to conventional logistic regression in predicting EOC diagnosis and XGBoost performed best in the external validation datasets. The AUC values of distinguishing EOC and benign disease patients, determining pathological type, grade and clinical stage were 0.958 (0.926-0.989), 0.792 (0.701-0.8834), 0.819 (0.687-0.950) and 0.68 (0.573-0.788) respectively. For negative CA-125 EOC patients, the AUC performance of XGBoost model was 0.835(0.763-0.907). We used unsupervised cluster analysis to identify EOC subgroups with significantly poor overall survival (p-value <0.0001) and recurrence-free survival (p-value <0.0001).
Based on the preoperative characteristics, we proved that ML algorithm can provide an acceptable diagnosis and prognosis prediction model for EOC patients. Meanwhile, SHAP analysis can improve the interpretability of ML models and contribute to precision medicine.
卵巢癌(OC)是最致命的妇科恶性肿瘤,早期筛查方法有限且预后较差。人工智能技术在癌症诊断方面取得了重大突破。
我们旨在基于多种生物标志物开发一种用于上皮性卵巢癌(EOC)诊断和预后的特定可解释机器学习(ML)预测模型。
共纳入521例EOC患者和144例妇科良性疾病患者,包括推导数据集和外部验证队列。通过9种监督式ML方法,利用34个参数获取预测信息。通过使用SHapley加性解释(SHAP)算法来改进最佳ML预测原因的背后机制。此外,通过无监督聚类和Kaplan-Meier(KM)生存分析对EOC的预后进行分析。
ML技术在预测EOC诊断方面优于传统逻辑回归,XGBoost在外部验证数据集中表现最佳。区分EOC和良性疾病患者、确定病理类型、分级和临床分期的AUC值分别为0.958(0.926 - 0.989)、0.792(0.701 - 0.8834)、0.819(0.687 - 0.950)和0.68(0.573 - 0.788)。对于CA-125阴性的EOC患者,XGBoost模型的AUC性能为0.835(0.763 - 0.907)。我们使用无监督聚类分析来识别总生存期(p值<0.0001)和无复发生存期(p值<0.0001)显著较差的EOC亚组。
基于术前特征,我们证明ML算法可为EOC患者提供可接受的诊断和预后预测模型。同时,SHAP分析可提高ML模型的可解释性并有助于精准医学。