Yu Wenbao, Park Taesung
BMC Genomics. 2014;15 Suppl 10(Suppl 10):S1. doi: 10.1186/1471-2164-15-S10-S1. Epub 2014 Dec 12.
It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data.
We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes.
We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.
当有多个标记物可用时,获得用于疾病分类和预测的最佳标记物组合是很常见的。已经提出了许多基于接收器操作特征曲线(AUC)下面积的方法。在高维背景下,现有的基于AUC的工作主要依赖于AUC的非参数平滑近似,对于高维数据,没有工作使用基于参数AUC的方法。
我们提出了一种基于惩罚回归的AUC方法(AucPR),这是一种用于获得线性组合以最大化AUC的参数方法。为了在高维背景下获得AUC最大化器,我们将低维背景下使用的经典参数AUC最大化器转换为回归框架,从而直接应用惩罚回归方法。考虑了两种惩罚,即套索惩罚和弹性网惩罚。参数方法可以避免传统基于非参数AUC方法的一些困难,例如缺乏合适的凹目标函数和平滑参数的谨慎选择。我们使用四个真实的微阵列和合成数据将提出的AucPR应用于基因选择和分类。通过数值研究,在给定特异性的AUC和敏感性方面,AucPR表现优于惩罚逻辑回归和基于非参数AUC的方法,特别是当有许多相关基因时。
我们提出了一种强大的参数化且易于实现的线性分类器AucPR,用于高维数据的基因选择和疾病预测。由于其良好的预测性能,推荐使用AucPR。除了基因表达微阵列数据外,AucPR还可应用于其他类型的高维组学数据,如miRNA和蛋白质数据。