• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过生存分析评估乳腺癌的关键预测因素:AFT脆弱模型与LASSO、岭回归和弹性网络正则化的比较

Evaluating key predictors of breast cancer through survival: a comparison of AFT frailty models with LASSO, ridge, and elastic net regularization.

作者信息

Bosson-Amedenu Senyefia, Ayitey Emmanuel, Ayiah-Mensah Francis, Asare Luyton

机构信息

Department of Mathematics, Statistics and Actuarial Science, Takoradi Technical University, Sekondi-Takoradi, Ghana.

出版信息

BMC Cancer. 2025 Apr 11;25(1):665. doi: 10.1186/s12885-025-14040-z.

DOI:10.1186/s12885-025-14040-z
PMID:40217202
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11987402/
Abstract

BACKGROUND

Frailty models are extensively utilized in survival analysis to address unobserved heterogeneity among individuals. However, selecting the most robust model for survival prediction, especially in the context of high-dimensional data, continues to pose a challenge. This study evaluates the performance of various Accelerated Failure Time (AFT) frailty models and examines the influence of regularization techniques, including LASSO, Ridge, and Elastic Net, on model selection and prediction accuracy.

METHODS

We utilized both simulated datasets and a real breast cancer dataset to compare the performance of seven Accelerated Failure Time (AFT) frailty models: Weibull, Log-logistic, Gamma, Gompertz, Log-normal, Generalized Gamma, and the Extreme Value Frailty AFT model. Model performance was evaluated using Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Mean Absolute Error (MAE), and Mean Squared Error (MSE) metrics across three sample sizes (25%, 50%, and 75%). To enhance parameter estimation and reduce overfitting in high-dimensional survival data, we applied regularization methods, including LASSO, Ridge, and Elastic Net. The Extreme Value Frailty AFT model consistently outperformed all other models across various sample sizes, demonstrating the lowest values for AIC, BIC, MAE, and MSE. These results indicate its superior fit and predictive accuracy. The forest plot analysis further validates the strong impact of significant covariates. The model's AIC ranged from 100.41 at a 25% sample size to 384.58 at a 75% sample size, consistently surpassing the performance of the second-best Log-logistic model. Furthermore, the application of LASSO regularization improved the model's parsimony by eliminating non-informative covariates, such as Age, PR, and Hospitalization, while retaining essential predictors like Competing Risks, Metastasis, Stage, and Lymph Node involvement.

CONCLUSION

The Extreme Value Frailty Accelerated Failure Time (AFT) model demonstrated strong predictive performance in survival analysis, particularly when combined with LASSO regularization to enhance interpretability and generalizability. Key predictors-including Comorbidity, Metastasis, Stage, and Lymph Node involvement-remained significant after regularization, with reduced coefficients. Notably, patients without metastasis had 2.63 times longer expected survival than those with metastatic disease, while lower-stage diagnoses and minimal lymph node involvement contributed to 26% and 16% longer survival times, respectively. Other significant factors included recurrence status (19% increase in survival), HER2 negativity (20% longer survival), absence of the Triple Negative subtype (15% longer survival), and lower tumor grades (11% longer survival).By effectively shrinking less relevant variables, LASSO mitigated overfitting while preserving critical predictors, reinforcing the importance of tumor characteristics and molecular markers in survival outcomes. The study highlights the crucial role of risk stratification, as patients categorized into Low, Medium, and High-risk groups exhibit distinct survival patterns, aligning with the Extreme Value AFT Frailty Model. The forest plot analysis further validates the strong impact of significant covariates, with Competing Risks, Lymph Node Involvement, and Metastasis emerging as the most critical prognostic factors. Kaplan-Meier survival analysis reveals sharp survival declines associated with metastasis, lymph node involvement, tumor grade, HER2 status, and molecular subtypes, reinforcing the urgent need for early detection and targeted interventions. Notably, patients with Triple Negative and HER2-overexpressing subtypes exhibit the poorest survival outcomes, highlighting the necessity for subtype-specific therapies. Additionally, competing risks, particularly hospitalization-related factors, substantially impact survival, emphasizing the need for integrated treatment approaches.These findings emphasize the role of advanced statistical techniques in improving survival predictions, providing valuable insights that can enhance clinical decision-making in breast cancer prognosis and broader medical research.

摘要

背景

脆弱模型在生存分析中被广泛应用,以解决个体间未观察到的异质性问题。然而,选择最稳健的生存预测模型,尤其是在高维数据的背景下,仍然是一个挑战。本研究评估了各种加速失效时间(AFT)脆弱模型的性能,并考察了正则化技术(包括LASSO、岭回归和弹性网络)对模型选择和预测准确性的影响。

方法

我们使用模拟数据集和真实乳腺癌数据集,比较了七种加速失效时间(AFT)脆弱模型的性能:威布尔模型、对数逻辑斯蒂模型、伽马模型、冈珀茨模型、对数正态模型、广义伽马模型和极值脆弱AFT模型。在三种样本量(25%、50%和75%)下,使用赤池信息准则(AIC)、贝叶斯信息准则(BIC)、平均绝对误差(MAE)和均方误差(MSE)指标评估模型性能。为了在高维生存数据中增强参数估计并减少过拟合,我们应用了正则化方法,包括LASSO、岭回归和弹性网络。在各种样本量下,极值脆弱AFT模型始终优于所有其他模型,其AIC、BIC、MAE和MSE值最低。这些结果表明其拟合优度和预测准确性更高。森林图分析进一步验证了显著协变量的强烈影响。该模型的AIC在25%样本量时为100.41,在75%样本量时为384.58,始终超过第二优的对数逻辑斯蒂模型的性能。此外,LASSO正则化的应用通过消除诸如年龄、孕激素受体(PR)和住院等无信息的协变量,提高了模型的简约性,同时保留了诸如竞争风险、转移、分期和淋巴结受累等重要预测因子。

结论

极值脆弱加速失效时间(AFT)模型在生存分析中表现出强大的预测性能,特别是与LASSO正则化相结合时,可提高可解释性和泛化能力。包括合并症、转移、分期和淋巴结受累在内的关键预测因子在正则化后仍然显著,但其系数有所降低。值得注意的是,无转移患者的预期生存期比有转移疾病的患者长2.63倍,而较低分期的诊断和最小的淋巴结受累分别使生存期延长26%和16%。其他显著因素包括复发状态(生存期增加19%)、人表皮生长因子受体2(HER2)阴性(生存期延长20%)、无三阴性亚型(生存期延长15%)和较低的肿瘤分级(生存期延长11%)。通过有效收缩不太相关的变量,LASSO减轻了过拟合,同时保留了关键预测因子,强化了肿瘤特征和分子标志物在生存结局中的重要性。该研究强调了风险分层的关键作用,因为分为低、中、高风险组的患者表现出不同的生存模式,与极值AFT脆弱模型一致。森林图分析进一步验证了显著协变量的强烈影响,其中竞争风险、淋巴结受累和转移成为最关键的预后因素。 Kaplan-Meier生存分析揭示了与转移、淋巴结受累、肿瘤分级、HER2状态和分子亚型相关的生存率急剧下降,强化了早期检测和靶向干预的迫切需求。值得注意的是,三阴性和HER2过表达亚型的患者生存结局最差,凸显了亚型特异性治疗的必要性。此外,竞争风险,特别是与住院相关的因素,对生存有重大影响,强调了综合治疗方法的必要性。这些发现强调了先进统计技术在改善生存预测方面的作用,提供了有价值的见解,可增强乳腺癌预后及更广泛医学研究中的临床决策。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/56849f908fd0/12885_2025_14040_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/818232cc3349/12885_2025_14040_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/4875de822612/12885_2025_14040_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/1ef6c3113215/12885_2025_14040_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/7180298427c9/12885_2025_14040_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/668b3b28536c/12885_2025_14040_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/c671c0af87be/12885_2025_14040_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/1af3a183cf5b/12885_2025_14040_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/ae9022ed1257/12885_2025_14040_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/58ead4c5e739/12885_2025_14040_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/6cb744f3b3ac/12885_2025_14040_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/42f1c0c06641/12885_2025_14040_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/9f68ac1d778a/12885_2025_14040_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/59b0aa5eeab9/12885_2025_14040_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/56849f908fd0/12885_2025_14040_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/818232cc3349/12885_2025_14040_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/4875de822612/12885_2025_14040_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/1ef6c3113215/12885_2025_14040_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/7180298427c9/12885_2025_14040_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/668b3b28536c/12885_2025_14040_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/c671c0af87be/12885_2025_14040_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/1af3a183cf5b/12885_2025_14040_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/ae9022ed1257/12885_2025_14040_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/58ead4c5e739/12885_2025_14040_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/6cb744f3b3ac/12885_2025_14040_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/42f1c0c06641/12885_2025_14040_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/9f68ac1d778a/12885_2025_14040_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/59b0aa5eeab9/12885_2025_14040_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5146/11987402/56849f908fd0/12885_2025_14040_Fig14_HTML.jpg

相似文献

1
Evaluating key predictors of breast cancer through survival: a comparison of AFT frailty models with LASSO, ridge, and elastic net regularization.通过生存分析评估乳腺癌的关键预测因素:AFT脆弱模型与LASSO、岭回归和弹性网络正则化的比较
BMC Cancer. 2025 Apr 11;25(1):665. doi: 10.1186/s12885-025-14040-z.
2
Development and validation of accelerated failure time model for cause-specific survival and prognostication of oral squamous cell carcinoma: SEER data analysis.加速失效时间模型的开发和验证,用于口腔鳞状细胞癌的特定原因生存和预后预测:SEER 数据分析。
PLoS One. 2024 Aug 26;19(8):e0309214. doi: 10.1371/journal.pone.0309214. eCollection 2024.
3
An Accelerated Failure Time Model to Predict Cause-Specific Survival and Prognostic Factors of Lung and Bronchus Cancer Patients with at Least Bone or Brain Metastases: Development and Internal Validation Using a SEER-Based Study.一种加速失效时间模型,用于预测至少有骨转移或脑转移的肺癌和支气管癌患者的特定病因生存率及预后因素:基于监测、流行病学和最终结果(SEER)研究的模型开发与内部验证
Cancers (Basel). 2024 Feb 4;16(3):668. doi: 10.3390/cancers16030668.
4
Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions.使用正则化线性回归模型的基因组选择:岭回归、套索回归、弹性网络及其扩展。
BMC Proc. 2012 May 21;6 Suppl 2(Suppl 2):S10. doi: 10.1186/1753-6561-6-S2-S10.
5
Stability selection for lasso, ridge and elastic net implemented with AFT models.使用加速失效时间(AFT)模型实现套索、岭回归和弹性网络的稳定性选择。
Stat Appl Genet Mol Biol. 2019 Oct 7;18(5):/j/sagmb.2019.18.issue-5/sagmb-2017-0001/sagmb-2017-0001.xml. doi: 10.1515/sagmb-2017-0001.
6
Application of Parametric Shared Frailty Models to Analyze Time-to-Death of Gastric Cancer Patients.参数共享脆弱性模型在分析胃癌患者死亡时间中的应用。
J Gastrointest Cancer. 2023 Mar;54(1):104-116. doi: 10.1007/s12029-021-00775-y. Epub 2022 Jan 22.
7
Development and evaluation of nomograms and risk stratification systems to predict the overall survival and cancer-specific survival of patients with hepatocellular carcinoma.开发和评估列线图和风险分层系统,以预测肝细胞癌患者的总生存率和癌症特异性生存率。
Clin Exp Med. 2024 Feb 28;24(1):44. doi: 10.1007/s10238-024-01296-1.
8
Data-driven survival modeling for breast cancer prognostics: A comparative study with machine learning and traditional survival modeling methods.用于乳腺癌预后的数据驱动生存建模:与机器学习和传统生存建模方法的比较研究。
PLoS One. 2025 Apr 22;20(4):e0318167. doi: 10.1371/journal.pone.0318167. eCollection 2025.
9
The effect of different approaches to determining the regularization parameter of bayesian LASSO on the accuracy of genomic prediction.贝叶斯套索法正则化参数不同确定方法对基因组预测准确性的影响。
Mamm Genome. 2025 Mar;36(1):331-345. doi: 10.1007/s00335-024-10088-7. Epub 2024 Dec 11.
10
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

引用本文的文献

1
A shared frailty model for assessing time to seizure remission in adults with epilepsy.一种用于评估癫痫成人患者癫痫发作缓解时间的共享脆弱性模型。
Sci Rep. 2025 Sep 1;15(1):32195. doi: 10.1038/s41598-025-17991-2.

本文引用的文献

1
The Shared Weighted Lindley Frailty Model for Clustered Failure Time Data.用于聚类失效时间数据的共享加权林德利脆弱性模型。
Biom J. 2025 Apr;67(2):e70044. doi: 10.1002/bimj.70044.
2
AFFECT: an R package for accelerated functional failure time model with error-contaminated survival times and applications to gene expression data.AFFECT:一个用于加速带有误差污染生存时间的功能失效时间模型的 R 包,以及在基因表达数据中的应用。
BMC Bioinformatics. 2024 Aug 13;25(1):265. doi: 10.1186/s12859-024-05831-5.
3
Research trends and hotspots of exercise for people with sarcopenic: A bibliometric analysis.
肌少症人群运动的研究趋势和热点:文献计量分析。
Medicine (Baltimore). 2023 Dec 15;102(50):e35148. doi: 10.1097/MD.0000000000035148.
4
Analysis of length-biased and partly interval-censored survival data with mismeasured covariates.带有测量误差协变量的长度偏倚和部分区间删失生存数据分析。
Biometrics. 2023 Dec;79(4):3929-3940. doi: 10.1111/biom.13898. Epub 2023 Jul 17.
5
An accelerated failure time regression model for illness-death data: A frailty approach.加速失效时间回归模型在疾病-死亡数据中的应用:脆弱性方法。
Biometrics. 2023 Dec;79(4):3066-3081. doi: 10.1111/biom.13880. Epub 2023 May 17.
6
A flexible parametric accelerated failure time model and the extension to time-dependent acceleration factors.一种灵活的参数加速失效时间模型及其对时变加速因子的扩展。
Biostatistics. 2023 Jul 14;24(3):811-831. doi: 10.1093/biostatistics/kxac009.
7
Detecting prognostic biomarkers of breast cancer by regularized Cox proportional hazards models.通过正则化 Cox 比例风险模型检测乳腺癌的预后生物标志物。
J Transl Med. 2021 Dec 20;19(1):514. doi: 10.1186/s12967-021-03180-y.
8
Quasi-linear Cox proportional hazards model with cross- L penalty.带交叉 L 惩罚的拟线性 Cox 比例风险模型。
BMC Med Res Methodol. 2020 Jul 6;20(1):182. doi: 10.1186/s12874-020-01063-2.
9
A tutorial on frailty models.脆弱模型教程。
Stat Methods Med Res. 2020 Nov;29(11):3424-3454. doi: 10.1177/0962280220921889. Epub 2020 May 28.
10
Investigation of Prognostic Factors of Survival in Breast Cancer Using a Frailty Model: A Multicenter Study.使用脆弱模型研究乳腺癌生存的预后因素:一项多中心研究
Breast Cancer (Auckl). 2019 Sep 29;13:1178223419879112. doi: 10.1177/1178223419879112. eCollection 2019.