Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, USA; Department of Acute and Tertiary Care, University of Tennessee Health Science Center, Memphis, USA.
Department of Mathematics, University of Memphis, Memphis, USA.
Methods. 2018 Aug 1;145:76-81. doi: 10.1016/j.ymeth.2018.05.011. Epub 2018 May 17.
Evaluating the differential expression of a set of genes belonging to a common biological process or ontology has proven to be a very useful tool for biological discovery. However, existing gene-set association methods are limited to applications that evaluate differential expression across k⩾2 treatment groups or biological categories. This limitation precludes researchers from most effectively evaluating the association with other phenotypes that may be more clinically meaningful, such as quantitative variables or censored survival time variables. Projection onto the Orthogonal Space Testing (POST) is proposed as a general procedure that can robustly evaluate the association of a gene-set with several different types of phenotypic data (categorical, ordinal, continuous, or censored). For each gene-set, POST transforms the gene profiles into a set of eigenvectors and then uses statistical modeling to compute a set of z-statistics that measure the association of each eigenvector with the phenotype. The overall gene-set statistic is the sum of squared z-statistics weighted by the corresponding eigenvalues. Finally, bootstrapping is used to compute a p-value. POST may evaluate associations with or without adjustment for covariates. In simulation studies, it is shown that the performance of POST in evaluating the association with a categorical phenotype is similar to or exceeds that of existing methods. In evaluating the association of 875 biological processes with the time to relapse of pediatric acute myeloid leukemia, POST identified the well-known oncogenic WNT signaling pathway as its top hit. These results indicate that POST can be a very useful tool for evaluating the association of a gene-set with a variety of different phenotypes. We have developed an R package named POST which is freely available in Bioconductor.
评估属于共同生物过程或本体的一组基因的差异表达已被证明是生物发现的非常有用的工具。然而,现有的基因集关联方法仅限于评估 k ⩾ 2 个处理组或生物类别之间的差异表达的应用。这种限制排除了研究人员从最有效的评估与其他可能更具临床意义的表型(如定量变量或删失生存时间变量)的关联。正交空间检验(POST)被提议作为一种通用程序,可以稳健地评估基因集与几种不同类型的表型数据(分类、有序、连续或删失)的关联。对于每个基因集,POST 将基因谱转换为一组特征向量,然后使用统计建模来计算一组 z 统计量,以衡量每个特征向量与表型的关联。总体基因集统计量是加权特征值的平方 z 统计量的总和。最后,使用引导来计算 p 值。POST 可以在不调整协变量的情况下评估关联,也可以在调整协变量的情况下评估关联。在模拟研究中,POST 评估与分类表型关联的性能与现有方法相似或超过现有方法。在评估 875 个生物学过程与儿科急性髓系白血病复发时间的关联时,POST 将著名的致癌 WNT 信号通路确定为其最佳命中。这些结果表明,POST 可以成为评估基因集与各种不同表型关联的非常有用的工具。我们开发了一个名为 POST 的 R 包,该包在 Bioconductor 中免费提供。