套索惩罚的经验性扩展以降低高维Cox回归模型中的错误发现率

Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models.

作者信息

Ternès Nils, Rotolo Federico, Michiels Stefan

机构信息

Université Paris-Saclay, Univ. Paris-Sud, UVSQ, CESP, INSERM, F-94805, Villejuif, France.

Gustave Roussy, Service de biostatistique et d'épidémiologie, F-94805, Villejuif, France.

出版信息

Stat Med. 2016 Jul 10;35(15):2561-73. doi: 10.1002/sim.6927. Epub 2016 Mar 10.

DOI:10.1002/sim.6927

PMID:26970107

Abstract

Correct selection of prognostic biomarkers among multiple candidates is becoming increasingly challenging as the dimensionality of biological data becomes higher. Therefore, minimizing the false discovery rate (FDR) is of primary importance, while a low false negative rate (FNR) is a complementary measure. The lasso is a popular selection method in Cox regression, but its results depend heavily on the penalty parameter λ. Usually, λ is chosen using maximum cross-validated log-likelihood (max-cvl). However, this method has often a very high FDR. We review methods for a more conservative choice of λ. We propose an empirical extension of the cvl by adding a penalization term, which trades off between the goodness-of-fit and the parsimony of the model, leading to the selection of fewer biomarkers and, as we show, to the reduction of the FDR without large increase in FNR. We conducted a simulation study considering null and moderately sparse alternative scenarios and compared our approach with the standard lasso and 10 other competitors: Akaike information criterion (AIC), corrected AIC, Bayesian information criterion (BIC), extended BIC, Hannan and Quinn information criterion (HQIC), risk information criterion (RIC), one-standard-error rule, adaptive lasso, stability selection, and percentile lasso. Our extension achieved the best compromise across all the scenarios between a reduction of the FDR and a limited raise of the FNR, followed by the AIC, the RIC, and the adaptive lasso, which performed well in some settings. We illustrate the methods using gene expression data of 523 breast cancer patients. In conclusion, we propose to apply our extension to the lasso whenever a stringent FDR with a limited FNR is targeted. Copyright © 2016 John Wiley & Sons, Ltd.

摘要

随着生物数据维度的不断增加，从多个候选者中正确选择预后生物标志物变得越来越具有挑战性。因此，将错误发现率（FDR）降至最低至关重要，而低错误阴性率（FNR）则是一项补充指标。套索法是Cox回归中一种常用的选择方法，但其结果在很大程度上依赖于惩罚参数λ。通常，使用最大交叉验证对数似然（max - cvl）来选择λ。然而，这种方法的FDR往往非常高。我们回顾了更保守选择λ的方法。我们提出了一种通过添加惩罚项对cvl进行的经验扩展，该惩罚项在模型的拟合优度和简约性之间进行权衡，从而导致选择更少的生物标志物，并且正如我们所展示的，在不使FNR大幅增加的情况下降低FDR。我们进行了一项模拟研究，考虑了零假设和中度稀疏替代情景，并将我们的方法与标准套索法以及其他10种竞争方法进行了比较：赤池信息准则（AIC）、校正后的AIC、贝叶斯信息准则（BIC）、扩展BIC、汉南和奎因信息准则（HQIC）、风险信息准则（RIC）、单标准误差规则、自适应套索法、稳定性选择和百分位数套索法。在所有情景中，我们的扩展在降低FDR和有限提高FNR之间实现了最佳平衡，其次是AIC、RIC和自适应套索法，它们在某些情况下表现良好。我们使用523名乳腺癌患者的基因表达数据说明了这些方法。总之，每当目标是严格控制FDR且FNR有限时，我们建议将我们的扩展应用于套索法。版权所有© 2016约翰威立父子有限公司。