基于联合惩罚的生物标志物识别模型选择

Model selection based on combined penalties for biomarker identification.

作者信息

Vradi Eleni, Brannath Werner, Jaki Thomas, Vonk Richardus

机构信息

a Department of Research and Clinical Sciences Statistics , Bayer AG , Berlin , Germany.

b Institute of Statistics, Competence Center for Clinical Trials Bremen , Faculty 3, University of Bremen , Bremen , Germany.

出版信息

J Biopharm Stat. 2018;28(4):735-749. doi: 10.1080/10543406.2017.1378662. Epub 2017 Oct 26.

DOI:10.1080/10543406.2017.1378662

PMID:29072549

Abstract

The growing role of targeted medicine has led to an increased focus on the development of actionable biomarkers. Current penalized selection methods that are used to identify biomarker panels for classification in high-dimensional data, however, often result in highly complex panels that need careful pruning for practical use. In the framework of regularization methods, a penalty that is a weighted sum of the L and L norm has been proposed to account for the complexity of the resulting model. In practice, the limitation of this penalty is that the objective function is non-convex, non-smooth, the optimization is computationally intensive and the application to high-dimensional settings is challenging. In this paper, we propose a stepwise forward variable selection method which combines the L with L or L norms. The penalized likelihood criterion that is used in the stepwise selection procedure results in more parsimonious models, keeping only the most relevant features. Simulation results and a real application show that our approach exhibits a comparable performance with common selection methods with respect to the prediction performance while minimizing the number of variables in the selected model resulting in a more parsimonious model as desired.

摘要

靶向药物作用的日益增强使得人们越来越关注可操作生物标志物的开发。然而，当前用于在高维数据中识别用于分类的生物标志物面板的惩罚选择方法，往往会导致非常复杂的面板，实际应用时需要仔细筛选。在正则化方法的框架下，有人提出一种惩罚项，它是L和L范数的加权和，以考虑所得模型的复杂性。实际上，这种惩罚项的局限性在于目标函数是非凸、非光滑的，优化计算量很大，并且应用于高维情况具有挑战性。在本文中，我们提出一种逐步向前变量选择方法，它结合了L与L或L范数。逐步选择过程中使用的惩罚似然准则会产生更简约的模型，只保留最相关的特征。模拟结果和实际应用表明，我们的方法在预测性能方面与常用选择方法表现相当，同时能使所选模型中的变量数量最小化，从而得到如预期般更简约的模型。