• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过优化一致性指数和稳定性选择提高稀疏生存模型的判别能力。

Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection.

作者信息

Mayr Andreas, Hofner Benjamin, Schmid Matthias

机构信息

Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Waldstr. 6, Erlangen, 91054, Germany.

Institut für Medizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universität Bonn, Sigmund-Freud-Str. 25, Bonn, 53105, Germany.

出版信息

BMC Bioinformatics. 2016 Jul 22;17:288. doi: 10.1186/s12859-016-1149-8.

DOI:10.1186/s12859-016-1149-8
PMID:27444890
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4957316/
Abstract

BACKGROUND

When constructing new biomarker or gene signature scores for time-to-event outcomes, the underlying aims are to develop a discrimination model that helps to predict whether patients have a poor or good prognosis and to identify the most influential variables for this task. In practice, this is often done fitting Cox models. Those are, however, not necessarily optimal with respect to the resulting discriminatory power and are based on restrictive assumptions. We present a combined approach to automatically select and fit sparse discrimination models for potentially high-dimensional survival data based on boosting a smooth version of the concordance index (C-index). Due to this objective function, the resulting prediction models are optimal with respect to their ability to discriminate between patients with longer and shorter survival times. The gradient boosting algorithm is combined with the stability selection approach to enhance and control its variable selection properties.

RESULTS

The resulting algorithm fits prediction models based on the rankings of the survival times and automatically selects only the most stable predictors. The performance of the approach, which works best for small numbers of informative predictors, is demonstrated in a large scale simulation study: C-index boosting in combination with stability selection is able to identify a small subset of informative predictors from a much larger set of non-informative ones while controlling the per-family error rate. In an application to discover biomarkers for breast cancer patients based on gene expression data, stability selection yielded sparser models and the resulting discriminatory power was higher than with lasso penalized Cox regression models.

CONCLUSION

The combination of stability selection and C-index boosting can be used to select small numbers of informative biomarkers and to derive new prediction rules that are optimal with respect to their discriminatory power. Stability selection controls the per-family error rate which makes the new approach also appealing from an inferential point of view, as it provides an alternative to classical hypothesis tests for single predictor effects. Due to the shrinkage and variable selection properties of statistical boosting algorithms, the latter tests are typically unfeasible for prediction models fitted by boosting.

摘要

背景

在构建用于事件发生时间结局的新生物标志物或基因特征评分时,其根本目的是开发一种判别模型,以帮助预测患者的预后是差还是好,并识别此任务中最具影响力的变量。在实践中,这通常通过拟合Cox模型来完成。然而,就所得的判别能力而言,这些模型不一定是最优的,并且基于限制性假设。我们提出了一种组合方法,基于对一致性指数(C-index)的平滑版本进行提升,自动选择并拟合用于潜在高维生存数据的稀疏判别模型。由于该目标函数,所得的预测模型在区分生存时间较长和较短的患者的能力方面是最优的。梯度提升算法与稳定性选择方法相结合,以增强和控制其变量选择特性。

结果

所得算法基于生存时间的排名拟合预测模型,并仅自动选择最稳定的预测因子。在大规模模拟研究中证明了该方法的性能,该方法对于少量信息性预测因子效果最佳:C-index提升与稳定性选择相结合,能够从大量非信息性预测因子中识别出一小部分信息性预测因子,同时控制家族误差率。在基于基因表达数据发现乳腺癌患者生物标志物的应用中,稳定性选择产生了更稀疏的模型,并且所得的判别能力高于套索惩罚Cox回归模型。

结论

稳定性选择和C-index提升的组合可用于选择少量信息性生物标志物,并得出在判别能力方面最优的新预测规则。稳定性选择控制家族误差率,这使得新方法从推理角度来看也很有吸引力,因为它为单个预测因子效应的经典假设检验提供了替代方法。由于统计提升算法的收缩和变量选择特性,后一种检验对于通过提升拟合的预测模型通常是不可行的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7131/4957316/3c557a7dd4c7/12859_2016_1149_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7131/4957316/675946e34c8e/12859_2016_1149_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7131/4957316/3c557a7dd4c7/12859_2016_1149_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7131/4957316/675946e34c8e/12859_2016_1149_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7131/4957316/3c557a7dd4c7/12859_2016_1149_Fig2_HTML.jpg

相似文献

1
Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection.通过优化一致性指数和稳定性选择提高稀疏生存模型的判别能力。
BMC Bioinformatics. 2016 Jul 22;17:288. doi: 10.1186/s12859-016-1149-8.
2
Boosting the concordance index for survival data--a unified framework to derive and evaluate biomarker combinations.提高生存数据的一致性指数——一种用于推导和评估生物标志物组合的统一框架。
PLoS One. 2014 Jan 6;9(1):e84483. doi: 10.1371/journal.pone.0084483. eCollection 2014.
3
Randomized boosting with multivariable base-learners for high-dimensional variable selection and prediction.基于多变量基学习器的随机boosting 算法在高维变量选择和预测中的应用。
BMC Bioinformatics. 2021 Sep 16;22(1):441. doi: 10.1186/s12859-021-04340-z.
4
Controlling false discoveries in high-dimensional situations: boosting with stability selection.在高维情形下控制错误发现:基于稳定性选择的增强方法
BMC Bioinformatics. 2015 May 6;16:144. doi: 10.1186/s12859-015-0575-3.
5
NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data.NCC-AUC:一种 AUC 优化方法,用于从基因组和临床数据中识别用于癌症预后的多生物标志物组。
Bioinformatics. 2015 Oct 15;31(20):3330-8. doi: 10.1093/bioinformatics/btv374. Epub 2015 Jun 18.
6
L1 penalized estimation in the Cox proportional hazards model.Cox比例风险模型中的L1惩罚估计
Biom J. 2010 Feb;52(1):70-84. doi: 10.1002/bimj.200900028.
7
Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction.多组学技术助力Cox回归模型中的变量选择以进行癌症预后预测。
Methods. 2017 Jul 15;124:100-107. doi: 10.1016/j.ymeth.2017.06.010. Epub 2017 Jun 13.
8
High-dimensional Cox models: the choice of penalty as part of the model building process.高维Cox模型:作为模型构建过程一部分的惩罚项选择
Biom J. 2010 Feb;52(1):50-69. doi: 10.1002/bimj.200900064.
9
Toward the precision breast cancer survival prediction utilizing combined whole genome-wide expression and somatic mutation analysis.利用全基因组表达与体细胞突变联合分析实现精准乳腺癌生存预测
BMC Med Genomics. 2018 Nov 20;11(Suppl 5):104. doi: 10.1186/s12920-018-0419-x.
10
A gradient boosting algorithm for survival analysis via direct optimization of concordance index.通过直接优化一致性指数的生存分析梯度提升算法。
Comput Math Methods Med. 2013;2013:873595. doi: 10.1155/2013/873595. Epub 2013 Nov 20.

引用本文的文献

1
Association of biological aging acceleration transitions and burdens with incident cardiovascular disease: longitudinal insights from a national cohort study.生物衰老加速转变及负担与心血管疾病发病的关联:一项全国队列研究的纵向见解
BMC Med. 2025 Jul 1;23(1):347. doi: 10.1186/s12916-025-04177-w.
2
Comparing survival outcomes between surgical and non-surgical treatments in patients with early-onset endometrial cancer and developing a nomogram to predict survival: a study based on Eastern and Western data sets.比较早期子宫内膜癌患者手术和非手术治疗的生存结局并建立预测生存的列线图:一项基于东西方数据集的研究
World J Surg Oncol. 2025 May 11;23(1):184. doi: 10.1186/s12957-025-03825-y.
3

本文引用的文献

1
Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent.通过坐标下降法求解Cox比例风险模型的正则化路径
J Stat Softw. 2011 Mar;39(5):1-13. doi: 10.18637/jss.v039.i05.
2
Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic.基于一致性统计量解决与评估生存终点预测模型相关的问题。
Biometrics. 2016 Sep;72(3):897-906. doi: 10.1111/biom.12470. Epub 2016 Jan 12.
3
The residual-based predictiveness curve: A visual tool to assess the performance of prediction models.
Machine learning algorithms for identifying predictive variables of mortality risk following dementia diagnosis: a longitudinal cohort study.
机器学习算法在识别痴呆症诊断后死亡风险预测变量中的应用:一项纵向队列研究。
Sci Rep. 2023 Jun 10;13(1):9480. doi: 10.1038/s41598-023-36362-3.
4
Mid-Arm Muscle Circumference or Body Weight-Standardized Hand Grip Strength in the GLIM Superiorly Predicts Survival in Chinese Colorectal Cancer Patients.上臂肌围或体重标准化握力在 GLIM 中可更好地预测中国结直肠癌患者的生存情况。
Nutrients. 2022 Dec 5;14(23):5166. doi: 10.3390/nu14235166.
5
A boosting first-hitting-time model for survival analysis in high-dimensional settings.一种用于高维环境下生存分析的提升首次命中时间模型。
Lifetime Data Anal. 2023 Apr;29(2):420-440. doi: 10.1007/s10985-022-09553-9. Epub 2022 Apr 27.
6
Randomized boosting with multivariable base-learners for high-dimensional variable selection and prediction.基于多变量基学习器的随机boosting 算法在高维变量选择和预测中的应用。
BMC Bioinformatics. 2021 Sep 16;22(1):441. doi: 10.1186/s12859-021-04340-z.
7
Development of a novel lipid metabolism-based risk score model in hepatocellular carcinoma patients.肝细胞癌患者中基于脂质代谢的新型风险评分模型的开发
BMC Gastroenterol. 2021 Feb 12;21(1):68. doi: 10.1186/s12876-021-01638-3.
8
Radiomics analysis using stability selection supervised component analysis for right-censored survival data.使用稳定性选择监督成分分析对右删失生存数据进行放射组学分析。
Comput Biol Med. 2020 Sep;124:103959. doi: 10.1016/j.compbiomed.2020.103959. Epub 2020 Aug 6.
9
Characterization of a five-microRNA signature as a prognostic biomarker for esophageal squamous cell carcinoma.鉴定五个 microRNA 作为食管鳞癌预后生物标志物的特征。
Sci Rep. 2019 Dec 27;9(1):19847. doi: 10.1038/s41598-019-56367-1.
10
Tree-based classification system incorporating the HVTT-PVTT score for personalized management of hepatocellular carcinoma patients with macroscopic vascular invasion.基于树的分类系统,纳入HVTT-PVTT评分,用于对伴有肉眼可见血管侵犯的肝细胞癌患者进行个性化管理。
Aging (Albany NY). 2019 Nov 3;11(21):9544-9555. doi: 10.18632/aging.102403.
基于残差的预测性曲线:一种评估预测模型性能的可视化工具。
Biometrics. 2016 Jun;72(2):392-401. doi: 10.1111/biom.12455. Epub 2015 Dec 17.
4
A weighting approach for judging the effect of patient strata on high-dimensional risk prediction signatures.一种用于判断患者分层对高维风险预测特征影响的加权方法。
BMC Bioinformatics. 2015 Sep 15;16:294. doi: 10.1186/s12859-015-0716-8.
5
Controlling false discoveries in high-dimensional situations: boosting with stability selection.在高维情形下控制错误发现:基于稳定性选择的增强方法
BMC Bioinformatics. 2015 May 6;16:144. doi: 10.1186/s12859-015-0575-3.
6
A permutation test to analyse systematic bias and random measurement errors of medical devices via boosting location and scale models.一种通过增强位置和尺度模型来分析医疗设备系统偏差和随机测量误差的排列检验。
Stat Methods Med Res. 2017 Jun;26(3):1443-1460. doi: 10.1177/0962280215581855. Epub 2015 Apr 24.
7
A strategy to build and validate a prognostic biomarker model based on RT-qPCR gene expression and clinical covariates.一种基于逆转录定量聚合酶链反应(RT-qPCR)基因表达和临床协变量构建并验证预后生物标志物模型的策略。
BMC Bioinformatics. 2015 Mar 28;16:106. doi: 10.1186/s12859-015-0537-9.
8
A systematic evaluation of high-dimensional, ensemble-based regression for exploring large model spaces in microbiome analyses.对基于集成的高维回归在微生物组分析中探索大型模型空间的系统评估。
BMC Bioinformatics. 2015 Feb 1;16:31. doi: 10.1186/s12859-015-0467-6.
9
Evaluating Random Forests for Survival Analysis using Prediction Error Curves.使用预测误差曲线评估随机森林用于生存分析
J Stat Softw. 2012 Sep;50(11):1-23. doi: 10.18637/jss.v050.i11.
10
Extending statistical boosting. An overview of recent methodological developments.扩展统计增强法。近期方法学进展综述。
Methods Inf Med. 2014;53(6):428-35. doi: 10.3414/ME13-01-0123. Epub 2014 Aug 12.