使用交叉验证评估基于高维数据的生存风险分类器的预测准确性。

Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data.

机构信息

Biometric Research Branch, US National Cancer Institute, Bethesda, MD 20892-7434, USA.

出版信息

Brief Bioinform. 2011 May;12(3):203-14. doi: 10.1093/bib/bbr001. Epub 2011 Feb 15.

DOI:10.1093/bib/bbr001

PMID:21324971

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3105299/

Abstract

Developments in whole genome biotechnology have stimulated statistical focus on prediction methods. We review here methodology for classifying patients into survival risk groups and for using cross-validation to evaluate such classifications. Measures of discrimination for survival risk models include separation of survival curves, time-dependent ROC curves and Harrell's concordance index. For high-dimensional data applications, however, computing these measures as re-substitution statistics on the same data used for model development results in highly biased estimates. Most developments in methodology for survival risk modeling with high-dimensional data have utilized separate test data sets for model evaluation. Cross-validation has sometimes been used for optimization of tuning parameters. In many applications, however, the data available are too limited for effective division into training and test sets and consequently authors have often either reported re-substitution statistics or analyzed their data using binary classification methods in order to utilize familiar cross-validation. In this article we have tried to indicate how to utilize cross-validation for the evaluation of survival risk models; specifically how to compute cross-validated estimates of survival distributions for predicted risk groups and how to compute cross-validated time-dependent ROC curves. We have also discussed evaluation of the statistical significance of a survival risk model and evaluation of whether high-dimensional genomic data adds predictive accuracy to a model based on standard covariates alone.

摘要

全基因组生物技术的发展激发了统计学对预测方法的关注。我们在这里回顾了将患者分类为生存风险组的方法，并使用交叉验证来评估此类分类。生存风险模型的判别措施包括生存曲线的分离、时间依赖性 ROC 曲线和 Harrell 的一致性指数。然而，对于高维数据应用，在用于模型开发的数据上计算这些措施作为重新替代统计数据会导致高度有偏的估计。用于高维数据生存风险建模的方法学的大多数发展都利用了单独的测试数据集来评估模型。交叉验证有时用于调整参数的优化。然而，在许多应用中，可用的数据太少，无法有效地分为训练集和测试集，因此作者通常要么报告重新替代统计数据，要么使用二进制分类方法分析其数据，以便利用熟悉的交叉验证。在本文中，我们试图指出如何利用交叉验证来评估生存风险模型；具体来说，如何计算预测风险组的交叉验证估计生存分布，以及如何计算交叉验证时间依赖性 ROC 曲线。我们还讨论了生存风险模型的统计显著性评估，以及评估高维基因组数据是否仅基于标准协变量为模型增加预测准确性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用交叉验证评估基于高维数据的生存风险分类器的预测准确性。

Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

使用交叉验证评估基于高维数据的生存风险分类器的预测准确性。

Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献