Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.
Stat Med. 2012 Jun 30;31(14):1464-74. doi: 10.1002/sim.4484. Epub 2012 Feb 23.
Health status and outcomes are frequently measured on an ordinal scale. For high-throughput genomic datasets, the common approach to analyzing ordinal response data has been to break the problem into one or more dichotomous response analyses. This dichotomous response approach does not make use of all available data and therefore leads to loss of power and increases the number of type I errors. Herein we describe an innovative frequentist approach that combines two statistical techniques, L(1) penalization and continuation ratio models, for modeling an ordinal response using gene expression microarray data. We conducted a simulation study to assess the performance of two computational approaches and two model selection criteria for fitting frequentist L(1) penalized continuation ratio models. Moreover, we empirically compared the approaches using three application datasets, each of which seeks to classify an ordinal class using microarray gene expression data as the predictor variables. We conclude that the L(1) penalized constrained continuation ratio model is a useful approach for modeling an ordinal response for datasets where the number of covariates (p) exceeds the sample size (n) and the decision of whether to use Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) for selecting the final model should depend upon the similarities between the pathologies underlying the disease states to be classified.
健康状况和结果通常在有序尺度上进行测量。对于高通量基因组数据集,分析有序响应数据的常见方法是将问题分解为一个或多个二项式响应分析。这种二项式响应方法没有利用所有可用的数据,因此会导致功率损失并增加 I 型错误的数量。在此,我们描述了一种创新的频率主义方法,该方法结合了两种统计技术,L(1)惩罚和连续比模型,用于使用基因表达微阵列数据对有序响应进行建模。我们进行了一项模拟研究,以评估两种计算方法和两种模型选择标准拟合频率主义 L(1)惩罚连续比模型的性能。此外,我们使用三个应用数据集经验比较了这些方法,每个数据集都试图使用微阵列基因表达数据作为预测变量对有序类别进行分类。我们得出结论,L(1)惩罚约束连续比模型是一种有用的方法,用于对数据集进行建模,其中协变量的数量(p)超过样本量(n),并且选择最终模型时使用 Akaike 信息准则(AIC)还是贝叶斯信息准则(BIC)的决策应取决于待分类疾病状态的病理学之间的相似性。