Archer Kellie J, Seffernick Anna Eames, Sun Shuai, Zhang Yiran
Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH 43210, USA.
Amgen Inc., 1 Amgen Center Dr, Thousand Oaks, CA 91320, USA.
Stats (Basel). 2022 Jun;5(2):371-384. doi: 10.3390/stats5020021. Epub 2022 Apr 15.
The stage of cancer is a discrete ordinal response that indicates the aggressiveness of disease and is often used by physicians to determine the type and intensity of treatment to be administered. For example, the FIGO stage in cervical cancer is based on the size and depth of the tumor as well as the level of spread. It may be of clinical relevance to identify molecular features from high-throughput genomic assays that are associated with the stage of cervical cancer to elucidate pathways related to tumor aggressiveness, identify improved molecular features that may be useful for staging, and identify therapeutic targets. High-throughput RNA-Seq data and corresponding clinical data (including stage) for cervical cancer patients have been made available through The Cancer Genome Atlas Project (TCGA). We recently described penalized Bayesian ordinal response models that can be used for variable selection for over-parameterized datasets, such as the TCGA-CESC dataset. Herein, we describe our ordinalbayes R package, available from the Comprehensive R Archive Network (CRAN), which enhances the runjags R package by enabling users to easily fit cumulative logit models when the outcome is ordinal and the number of predictors exceeds the sample size, > , such as for TCGA and other high-throughput genomic data. We demonstrate the use of this package by applying it to the TCGA cervical cancer dataset. Our ordinalbayes package can be used to fit models to high-dimensional datasets, and it effectively performs variable selection.
癌症分期是一种离散的有序反应,表明疾病的侵袭性,医生常据此确定要实施的治疗类型和强度。例如,宫颈癌的国际妇产科联盟(FIGO)分期基于肿瘤的大小、深度以及扩散程度。从高通量基因组检测中识别与宫颈癌分期相关的分子特征,以阐明与肿瘤侵袭性相关的途径、识别可能有助于分期的改进分子特征并确定治疗靶点,可能具有临床意义。通过癌症基因组图谱计划(TCGA)已提供了宫颈癌患者的高通量RNA测序数据及相应临床数据(包括分期)。我们最近描述了惩罚贝叶斯有序反应模型,可用于对超参数化数据集(如TCGA - CESC数据集)进行变量选择。在此,我们描述了可从综合R存档网络(CRAN)获取的ordinalbayes R包,当结果为有序且预测变量数量超过样本量(即n > p)时,例如对于TCGA和其他高通量基因组数据,该包通过使用户能够轻松拟合累积对数模型增强了runjags R包。我们通过将其应用于TCGA宫颈癌数据集来展示此包的用途。我们的ordinalbayes包可用于对高维数据集拟合模型,并有效进行变量选择。