机器学习技术在卵巢癌生存预测中的应用。

Application of machine learning techniques for predicting survival in ovarian cancer.

机构信息

Department of Computer Engineering, Urmia University, Urmia, Iran.

Center for Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark.

出版信息

BMC Med Inform Decis Mak. 2022 Dec 30;22(1):345. doi: 10.1186/s12911-022-02087-y.

DOI:10.1186/s12911-022-02087-y

PMID:36585641

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9801354/

Abstract

BACKGROUND

Ovarian cancer is the fifth leading cause of mortality among women in the United States. Ovarian cancer is also known as forgotten cancer or silent disease. The survival of ovarian cancer patients depends on several factors, including the treatment process and the prognosis.

METHODS

The ovarian cancer patients' dataset is compiled from the Surveillance, Epidemiology, and End Results (SEER) database. With the help of a clinician, the dataset is curated, and the most relevant features are selected. Pearson's second coefficient of skewness test is used to evaluate the skewness of the dataset. Pearson correlation coefficient is also used to investigate the associations between features. Statistical test is utilized to evaluate the significance of the features. Six Machine Learning (ML) models, including K-Nearest Neighbors , Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Adaptive Boosting (AdaBoost), and Extreme Gradient Boosting (XGBoost), are implemented for survival prediction in both classification and regression approaches. An interpretable method, Shapley Additive Explanations (SHAP), is applied to clarify the decision-making process and determine the importance of each feature in prediction. Additionally, DTs of the RF model are displayed to show how the model predicts the survival intervals.

RESULTS

Our results show that RF (Accuracy = 88.72%, AUC = 82.38%) and XGBoost (Root Mean Squad Error (RMSE)) = 20.61%, R = 0.4667) have the best performance for classification and regression approaches, respectively. Furthermore, using the SHAP method along with extracted DTs of the RF model, the most important features in the dataset are identified. Histologic type ICD-O-3, chemotherapy recode, year of diagnosis, age at diagnosis, tumor stage, and grade are the most important determinant factors in survival prediction.

CONCLUSION

To the best of our knowledge, our study is the first study that develops various ML models to predict ovarian cancer patients' survival on the SEER database in both classification and regression approaches. These ML algorithms also achieve more accurate results and outperform statistical methods. Furthermore, our study is the first study to use the SHAP method to increase confidence and transparency of the proposed models' prediction for clinicians. Moreover, our developed models, as an automated auxiliary tool, can help clinicians to have a better understanding of the estimated survival as well as important features that affect survival.

摘要

背景

卵巢癌是美国女性死亡的第五大主要原因。卵巢癌也被称为被遗忘的癌症或无声的疾病。卵巢癌患者的生存取决于多个因素，包括治疗过程和预后。

方法

卵巢癌患者数据集是从监测、流行病学和最终结果（SEER）数据库中编译而来的。在临床医生的帮助下，对数据集进行了整理，并选择了最相关的特征。使用皮尔逊第二偏度系数检验来评估数据集的偏度。还使用皮尔逊相关系数来研究特征之间的相关性。统计检验用于评估特征的显著性。实施了六种机器学习（ML）模型，包括 K-最近邻、支持向量机（SVM）、决策树（DT）、随机森林（RF）、自适应提升（AdaBoost）和极端梯度提升（XGBoost），以进行分类和回归方法的生存预测。应用可解释方法 Shapley 加性解释（SHAP）来阐明决策过程，并确定预测中每个特征的重要性。此外，还显示了 RF 模型的 DT，以显示模型如何预测生存间隔。

结果

我们的结果表明，RF（准确度=88.72%，AUC=82.38%）和 XGBoost（均方根误差（RMSE）=20.61%，R=0.4667）在分类和回归方法中分别具有最佳性能。此外，使用 SHAP 方法以及 RF 模型提取的 DT，可以确定数据集中最重要的特征。组织学类型 ICD-O-3、化疗编码、诊断年份、诊断时的年龄、肿瘤分期和分级是生存预测中最重要的决定因素。

结论

据我们所知，我们的研究是第一项在 SEER 数据库中使用各种 ML 模型进行分类和回归方法的卵巢癌患者生存预测的研究。这些 ML 算法还实现了更准确的结果，并优于统计方法。此外，我们的研究是第一项使用 SHAP 方法来提高临床医生对所提出模型预测的信心和透明度的研究。此外，我们开发的模型作为一种自动化辅助工具，可以帮助临床医生更好地了解估计的生存以及影响生存的重要特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/948d/9801579/0cb781091bed/12911_2022_2087_Fig1_HTML.jpg

相似文献

Application of machine learning techniques for predicting survival in ovarian cancer.机器学习技术在卵巢癌生存预测中的应用。

BMC Med Inform Decis Mak. 2022 Dec 30;22(1):345. doi: 10.1186/s12911-022-02087-y.

Application of interpretable machine learning algorithms to predict distant metastasis in ovarian clear cell carcinoma.可解释机器学习算法在预测卵巢透明细胞癌远处转移中的应用。

Cancer Med. 2024 Apr;13(7):e7161. doi: 10.1002/cam4.7161.

Prediction and feature selection of low birth weight using machine learning algorithms.利用机器学习算法预测和选择低出生体重。

J Health Popul Nutr. 2024 Oct 12;43(1):157. doi: 10.1186/s41043-024-00647-8.

Machine learning-based models for the prediction of breast cancer recurrence risk.基于机器学习的乳腺癌复发风险预测模型。

BMC Med Inform Decis Mak. 2023 Nov 29;23(1):276. doi: 10.1186/s12911-023-02377-z.

Application of interpretable machine learning algorithms to predict distant metastasis in osteosarcoma.可解释机器学习算法在预测骨肉瘤远处转移中的应用。

Cancer Med. 2023 Feb;12(4):5025-5034. doi: 10.1002/cam4.5225. Epub 2022 Sep 9.

Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer.应用机器学习方法预测食管癌患者的5年生存状况。

J Thorac Dis. 2021 Nov;13(11):6240-6251. doi: 10.21037/jtd-21-1107.

XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease.基于 XGBoost-SHAP 的阿尔茨海默病可解释诊断框架。

BMC Med Inform Decis Mak. 2023 Jul 25;23(1):137. doi: 10.1186/s12911-023-02238-9.

Application of an Interpretable Machine Learning Model to Predict Lymph Node Metastasis in Patients with Laryngeal Carcinoma.一种可解释的机器学习模型在预测喉癌患者淋巴结转移中的应用

J Oncol. 2022 Nov 12;2022:6356399. doi: 10.1155/2022/6356399. eCollection 2022.

A 5-year survival status prognosis of nonmetastatic cervical cancer patients through machine learning algorithms.通过机器学习算法预测非转移性宫颈癌患者的 5 年生存状态。

Cancer Med. 2023 Mar;12(6):6867-6876. doi: 10.1002/cam4.5477. Epub 2022 Dec 8.

A Risk Prediction Model for Physical Restraints Among Older Chinese Adults in Long-term Care Facilities: Machine Learning Study.长期护理机构中老年人身体约束的风险预测模型：机器学习研究。

J Med Internet Res. 2023 Apr 6;25:e43815. doi: 10.2196/43815.

引用本文的文献

Machine Learning Models for Predicting Gynecological Cancers: Advances, Challenges, and Future Directions.用于预测妇科癌症的机器学习模型：进展、挑战与未来方向。

Cancers (Basel). 2025 Aug 27;17(17):2799. doi: 10.3390/cancers17172799.

A Review on Biomarker-Enhanced Machine Learning for Early Diagnosis and Outcome Prediction in Ovarian Cancer Management.生物标志物增强机器学习在卵巢癌管理中的早期诊断和预后预测综述

Cancer Med. 2025 Sep;14(17):e71224. doi: 10.1002/cam4.71224.

CT Radiomics-based machine learning approach for the invasiveness of pulmonary ground-glass nodules prediction.基于CT影像组学的机器学习方法用于预测肺磨玻璃结节的侵袭性

Eur J Radiol Open. 2025 Aug 23;15:100680. doi: 10.1016/j.ejro.2025.100680. eCollection 2025 Dec.

Predicting the prognosis of epithelial ovarian cancer patients based on deep learning models.基于深度学习模型预测上皮性卵巢癌患者的预后。

Front Oncol. 2025 Jul 25;15:1592746. doi: 10.3389/fonc.2025.1592746. eCollection 2025.

Identification of three T cell-related genes as diagnostic and prognostic biomarkers for triple-negative breast cancer and exploration of potential mechanisms.鉴定三个T细胞相关基因作为三阴性乳腺癌的诊断和预后生物标志物并探索潜在机制。

Front Genet. 2025 Jun 18;16:1584334. doi: 10.3389/fgene.2025.1584334. eCollection 2025.

Causal inference in the diagnosis and prognosis of ovarian cancer: current state and future directions.卵巢癌诊断与预后中的因果推断：现状与未来方向

Clin Transl Oncol. 2025 Jun 19. doi: 10.1007/s12094-025-03967-1.

Quantitative and qualitative metrics of tumor stroma in predicting ovarian cancer outcomes and expansion of its study with AI-based tools.肿瘤基质的定量和定性指标在预测卵巢癌预后中的作用以及基于人工智能工具对其研究的拓展

Mol Ther Oncol. 2025 May 24;33(2):201001. doi: 10.1016/j.omton.2025.201001. eCollection 2025 Jun 18.

A pioneering artificial intelligence tool to predict treatment outcomes in ovarian cancer via diagnostic laparoscopy.一种通过诊断性腹腔镜检查预测卵巢癌治疗结果的开创性人工智能工具。

Sci Rep. 2025 Apr 25;15(1):14437. doi: 10.1038/s41598-025-98434-w.

Comparing the Effectiveness of Artificial Intelligence Models in Predicting Ovarian Cancer Survival: A Systematic Review.比较人工智能模型预测卵巢癌生存率的有效性：一项系统评价

Cancer Rep (Hoboken). 2025 Mar;8(3):e70138. doi: 10.1002/cnr2.70138.

Benchmarking histopathology foundation models for ovarian cancer bevacizumab treatment response prediction from whole slide images.基于全切片图像的卵巢癌贝伐单抗治疗反应预测的组织病理学基础模型基准测试。

Discov Oncol. 2025 Feb 17;16(1):196. doi: 10.1007/s12672-025-01973-x.

本文引用的文献

Prediction of lung metastases in thyroid cancer using machine learning based on SEER database.基于 SEER 数据库的机器学习预测甲状腺癌肺转移。

Cancer Med. 2022 Jun;11(12):2503-2515. doi: 10.1002/cam4.4617. Epub 2022 Feb 22.

Ovarian cancer survival by stage, histotype, and pre-diagnostic lifestyle factors, in the prospective UK Million Women Study.按分期、组织学类型和诊断前生活方式因素划分的卵巢癌生存情况，在英国前瞻性百万妇女研究中。

Cancer Epidemiol. 2022 Feb;76:102074. doi: 10.1016/j.canep.2021.102074. Epub 2021 Dec 20.

Research on expansion and classification of imbalanced data based on SMOTE algorithm.基于 SMOTE 算法的不平衡数据扩充与分类研究。

Sci Rep. 2021 Dec 15;11(1):24039. doi: 10.1038/s41598-021-03430-5.

COVID-19 diagnosis from routine blood tests using artificial intelligence techniques.使用人工智能技术通过常规血液检测诊断新冠病毒肺炎

Biomed Signal Process Control. 2022 Feb;72:103263. doi: 10.1016/j.bspc.2021.103263. Epub 2021 Nov 1.

Feature Selection is Critical for 2-Year Prognosis in Advanced Stage High Grade Serous Ovarian Cancer by Using Machine Learning.使用机器学习对晚期高级别浆液性卵巢癌进行 2 年预后的特征选择至关重要。

Cancer Control. 2021 Jan-Dec;28:10732748211044678. doi: 10.1177/10732748211044678.

Association Between Overall Survival and the Tendency for Cancer Programs to Administer Neoadjuvant Chemotherapy for Patients With Advanced Ovarian Cancer.总体生存与癌症计划对晚期卵巢癌患者实施新辅助化疗倾向之间的关系。

JAMA Oncol. 2021 Dec 1;7(12):1782-1790. doi: 10.1001/jamaoncol.2021.4252.

Artificial intelligence in cancer research, diagnosis and therapy.人工智能在癌症研究、诊断与治疗中的应用。

Nat Rev Cancer. 2021 Dec;21(12):747-752. doi: 10.1038/s41568-021-00399-1. Epub 2021 Sep 17.

Opening the Black Box: The Promise and Limitations of Explainable Machine Learning in Cardiology.揭开黑箱：可解释机器学习在心脏病学中的前景与局限。

Can J Cardiol. 2022 Feb;38(2):204-213. doi: 10.1016/j.cjca.2021.09.004. Epub 2021 Sep 14.

Patients-centered SurvivorShIp care plan after Cancer treatments based on Big Data and Artificial Intelligence technologies (PERSIST): a multicenter study protocol to evaluate efficacy of digital tools supporting cancer survivors.基于大数据和人工智能技术的癌症治疗后以患者为中心的生存护理计划（PERSIST）：一项多中心研究方案，旨在评估支持癌症幸存者的数字工具的疗效。

BMC Med Inform Decis Mak. 2021 Aug 14;21(1):243. doi: 10.1186/s12911-021-01603-w.

The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation.在回归分析评估中，决定系数R平方比对称平均绝对百分比误差（SMAPE）、平均绝对误差（MAE）、平均绝对百分比误差（MAPE）、均方误差（MSE）和均方根误差（RMSE）更具信息量。

PeerJ Comput Sci. 2021 Jul 5;7:e623. doi: 10.7717/peerj-cs.623. eCollection 2021.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

机器学习技术在卵巢癌生存预测中的应用。

Application of machine learning techniques for predicting survival in ovarian cancer.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献