Suppr超能文献

基于 SEER 数据的机器学习与统计生存模型在宫颈癌预后与风险因素评估中的比较研究。

Comparative study of machine learning and statistical survival models for enhancing cervical cancer prognosis and risk factor assessment using SEER data.

机构信息

Department of Mathematics, School of Advanced Sciences, Vellore Institute of Technology, Vellore, Tamil Nadu, India.

出版信息

Sci Rep. 2024 Sep 27;14(1):22203. doi: 10.1038/s41598-024-72790-5.

Abstract

Cervical cancer is a common malignant tumor of the female reproductive system and the leading cause of death among women worldwide. The survival prediction method can be used to effectively analyze the time to event, which is essential in any clinical study. This study aims to bridge the gap between traditional statistical methods and machine learning in survival analysis by revealing which techniques are most effective in predicting survival, with a particular emphasis on improving prediction accuracy and identifying key risk factors for cervical cancer. Women with cervical cancer diagnosed between 2013 and 2015 were included in our study using data from the Surveillance, Epidemiology, and End Results (SEER) database. Using this dataset, the study assesses the performance of Weibull, Cox proportional hazards models, and Random Survival Forests in terms of predictive accuracy and risk factor identification. The findings reveal that machine learning models, particularly Random Survival Forests (RSF), outperform traditional statistical methods in both predictive accuracy and the discernment of crucial prognostic factors, underscoring the advantages of machine learning in handling complex survival data. However, for a survival dataset with a small number of predictors, statistical models should be used first. The study finds that RSF models enhance survival analysis with more accurate predictions and insights into survival risk factors but highlights the need for larger datasets and further research on model interpretability and clinical applicability.

摘要

宫颈癌是女性生殖系统常见的恶性肿瘤,也是全球女性死亡的主要原因。生存预测方法可用于有效地分析事件时间,这在任何临床研究中都是必不可少的。本研究旨在通过揭示哪些技术在生存分析中最有效来弥合传统统计方法和机器学习之间的差距,特别强调提高预测准确性和识别宫颈癌的关键风险因素。本研究使用来自监测、流行病学和最终结果(SEER)数据库的数据,纳入了 2013 年至 2015 年间诊断为宫颈癌的女性。使用该数据集,研究评估了 Weibull、Cox 比例风险模型和随机生存森林在预测准确性和风险因素识别方面的性能。研究结果表明,机器学习模型,特别是随机生存森林(RSF),在预测准确性和识别关键预后因素方面均优于传统统计方法,突出了机器学习在处理复杂生存数据方面的优势。然而,对于具有少量预测因子的生存数据集,应首先使用统计模型。该研究发现,RSF 模型通过更准确的预测和对生存风险因素的深入了解来增强生存分析,但强调需要更大的数据集以及对模型可解释性和临床适用性的进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7350/11437206/e6d925b397e9/41598_2024_72790_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验