Department of Health Information Technology and Management, Medical Informatics, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
Obstetrics and Gynecology, Cancer Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
BMC Cancer. 2023 Apr 13;23(1):341. doi: 10.1186/s12885-023-10808-3.
Cervical cancer is a common malignant tumor of the female reproductive system and is considered a leading cause of mortality in women worldwide. The analysis of time to event, which is crucial for any clinical research, can be well done with the method of survival prediction. This study aims to systematically investigate the use of machine learning to predict survival in patients with cervical cancer.
An electronic search of the PubMed, Scopus, and Web of Science databases was performed on October 1, 2022. All articles extracted from the databases were collected in an Excel file and duplicate articles were removed. The articles were screened twice based on the title and the abstract and checked again with the inclusion and exclusion criteria. The main inclusion criterion was machine learning algorithms for predicting cervical cancer survival. The information extracted from the articles included authors, publication year, dataset details, survival type, evaluation criteria, machine learning models, and the algorithm execution method.
A total of 13 articles were included in this study, most of which were published from 2018 onwards. The most common machine learning models were random forest (6 articles, 46%), logistic regression (4 articles, 30%), support vector machines (3 articles, 23%), ensemble and hybrid learning (3 articles, 23%), and Deep Learning (3 articles, 23%). The number of sample datasets in the study varied between 85 and 14946 patients, and the models were internally validated except for two articles. The area under the curve (AUC) range for overall survival (0.40 to 0.99), disease-free survival (0.56 to 0.88), and progression-free survival (0.67 to 0.81), respectively from (lowest to highest) received. Finally, 15 variables with an effective role in predicting cervical cancer survival were identified.
Combining heterogeneous multidimensional data with machine learning techniques can play a very influential role in predicting cervical cancer survival. Despite the benefits of machine learning, the problem of interpretability, explainability, and imbalanced datasets is still one of the biggest challenges. Providing machine learning algorithms for survival prediction as a standard requires further studies.
宫颈癌是女性生殖系统常见的恶性肿瘤,被认为是全球女性死亡的主要原因。对于任何临床研究,时间事件的分析都非常重要,而生存预测方法可以很好地进行分析。本研究旨在系统地调查机器学习在预测宫颈癌患者生存中的应用。
于 2022 年 10 月 1 日在 PubMed、Scopus 和 Web of Science 数据库中进行电子检索。从数据库中提取的所有文章都被收集在一个 Excel 文件中,并去除重复的文章。文章经过两次基于标题和摘要的筛选,并再次根据纳入和排除标准进行检查。主要纳入标准是用于预测宫颈癌生存的机器学习算法。从文章中提取的信息包括作者、发表年份、数据集详情、生存类型、评估标准、机器学习模型和算法执行方法。
本研究共纳入 13 篇文章,其中大多数发表于 2018 年以后。最常用的机器学习模型是随机森林(6 篇,46%)、逻辑回归(4 篇,30%)、支持向量机(3 篇,23%)、集成和混合学习(3 篇,23%)和深度学习(3 篇,23%)。研究中样本数据集的数量在 85 到 14946 例患者之间不等,除了 2 篇文章外,其余模型都进行了内部验证。总体生存(0.40 至 0.99)、无病生存(0.56 至 0.88)和无进展生存(0.67 至 0.81)的曲线下面积(AUC)范围分别从(最低到最高)获得。最后,确定了 15 个对预测宫颈癌生存有有效作用的变量。
将异质多维数据与机器学习技术相结合,可以在预测宫颈癌生存方面发挥非常重要的作用。尽管机器学习有其优势,但可解释性、解释性和不平衡数据集的问题仍然是最大挑战之一。提供生存预测的机器学习算法作为标准还需要进一步研究。