Ma Shuangge, Huang Jian
Department of Biostatistics, University of Washington, Washington, USA.
Bioinformatics. 2005 Dec 15;21(24):4356-62. doi: 10.1093/bioinformatics/bti724. Epub 2005 Oct 18.
An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease classification. Thus there is a need for developing statistical methods that can efficiently use such high-throughput genomic data, select biomarkers with discriminant power and construct classification rules. The ROC (receiver operator characteristic) technique has been widely used in disease classification with low-dimensional biomarkers because (1) it does not assume a parametric form of the class probability as required for example in the logistic regression method; (2) it accommodates case-control designs and (3) it allows treating false positives and false negatives differently. However, due to computational difficulties, the ROC-based classification has not been used with microarray data. Moreover, the standard ROC technique does not incorporate built-in biomarker selection.
We propose a novel method for biomarker selection and classification using the ROC technique for microarray data. The proposed method uses a sigmoid approximation to the area under the ROC curve as the objective function for classification and the threshold gradient descent regularization method for estimation and biomarker selection. Tuning parameter selection based on the V-fold cross validation and predictive performance evaluation are also investigated. The proposed approach is demonstrated with a simulation study, the Colon data and the Estrogen data. The proposed approach yields parsimonious models with excellent classification performance.
微阵列的一个重要应用是在数以万计被检测的基因中发现用于疾病分类的基因组生物标志物。因此,需要开发能够有效利用此类高通量基因组数据、选择具有判别力的生物标志物并构建分类规则的统计方法。ROC(接收者操作特征)技术已广泛用于具有低维生物标志物的疾病分类,原因如下:(1)它不像逻辑回归方法那样假设类概率的参数形式;(2)它适用于病例对照设计;(3)它允许对假阳性和假阴性区别对待。然而,由于计算困难,基于ROC的分类尚未用于微阵列数据。此外,标准的ROC技术没有纳入内置的生物标志物选择。
我们提出了一种使用ROC技术对微阵列数据进行生物标志物选择和分类的新方法。所提出的方法使用对ROC曲线下面积的Sigmoid近似作为分类的目标函数,并使用阈值梯度下降正则化方法进行估计和生物标志物选择。还研究了基于V折交叉验证的调优参数选择和预测性能评估。通过模拟研究、结肠数据和雌激素数据对所提出的方法进行了验证。所提出的方法产生了具有优异分类性能的简约模型。