College of Public Health, The Ohio State University, Columbus, Ohio, USA.
Stat Med. 2021 Mar 15;40(6):1453-1481. doi: 10.1002/sim.8851. Epub 2020 Dec 18.
Many previous studies have identified associations between gene expression, measured using high-throughput genomic platforms, and quantitative or dichotomous traits. However, we note that health outcome and disease status measurements frequently appear on an ordinal scale, that is, the outcome is categorical but has inherent ordering. Identification of important genes may be useful for developing novel diagnostic and prognostic tools to predict or classify stage of disease. Gene expression data are usually high-dimensional, meaning that the number of genes is much larger than the sample size or number of patients. Herein we describe some existing frequentist methods for modeling an ordinal response in a high-dimensional predictor space. Following Tibshirani (1996), who described the LASSO estimate as the Bayesian posterior mode when the regression coefficients have independent Laplace priors, we propose a new approach for high-dimensional data with an ordinal response that is rooted in the Bayesian paradigm. We show that our proposed Bayesian approach outperforms existing frequentist methods through simulation studies. We then compare the performance of frequentist and Bayesian approaches using a study evaluating progression to hepatocellular carcinoma in hepatitis C infected patients.
许多先前的研究已经确定了使用高通量基因组平台测量的基因表达与定量或二分特征之间的关联。然而,我们注意到,健康结果和疾病状态的测量通常呈有序尺度,即结果是分类的,但具有内在的顺序。确定重要的基因可能有助于开发新的诊断和预后工具,以预测或分类疾病的阶段。基因表达数据通常是高维的,这意味着基因的数量远远大于样本量或患者数量。在此,我们描述了一些现有的频率主义方法,用于在高维预测器空间中对有序响应进行建模。在 Tibshirani(1996 年)描述了当回归系数具有独立的拉普拉斯先验时 LASSO 估计为贝叶斯后验模式之后,我们提出了一种新的方法,用于具有有序响应的高维数据,该方法植根于贝叶斯范例。我们通过模拟研究表明,我们提出的贝叶斯方法优于现有的频率主义方法。然后,我们使用评估丙型肝炎感染患者向肝细胞癌进展的研究来比较频率主义和贝叶斯方法的性能。