J Craig Venter Institute, 4120 Capricorn Lane, La Jolla 92037, CA, USA.
Department of Surgery, University of Rochester, 601 Elmwood Ave, Rochester, Rochester 14642, NY, USA.
BMC Bioinformatics. 2019 Apr 15;20(1):185. doi: 10.1186/s12859-019-2783-8.
For many practical hypothesis testing (H-T) applications, the data are correlated and/or with heterogeneous variance structure. The regression t-test for weighted linear mixed-effects regression (LMER) is a legitimate choice because it accounts for complex covariance structure; however, high computational costs and occasional convergence issues make it impractical for analyzing high-throughput data. In this paper, we propose computationally efficient parametric and semiparametric tests based on a set of specialized matrix techniques dubbed as the PB-transformation. The PB-transformation has two advantages: 1. The PB-transformed data will have a scalar variance-covariance matrix. 2. The original H-T problem will be reduced to an equivalent one-sample H-T problem. The transformed problem can then be approached by either the one-sample Student's t-test or Wilcoxon signed rank test.
In simulation studies, the proposed methods outperform commonly used alternative methods under both normal and double exponential distributions. In particular, the PB-transformed t-test produces notably better results than the weighted LMER test, especially in the high correlation case, using only a small fraction of computational cost (3 versus 933 s). We apply these two methods to a set of RNA-seq gene expression data collected in a breast cancer study. Pathway analyses show that the PB-transformed t-test reveals more biologically relevant findings in relation to breast cancer than the weighted LMER test.
As fast and numerically stable replacements for the weighted LMER test, the PB-transformed tests are especially suitable for "messy" high-throughput data that include both independent and matched/repeated samples. By using our method, the practitioners no longer have to choose between using partial data (applying paired tests to only the matched samples) or ignoring the correlation in the data (applying two sample tests to data with some correlated samples). Our method is implemented as an R package 'PBtest' and is available at https://github.com/yunzhang813/PBtest-R-Package .
对于许多实际的假设检验(H-T)应用,数据是相关的,或者具有异方差结构。加权线性混合效应回归(LMER)的回归 t 检验是一个合理的选择,因为它考虑了复杂的协方差结构;然而,高计算成本和偶尔的收敛问题使得它不适用于分析高通量数据。在本文中,我们提出了基于一组专门的矩阵技术的计算高效的参数和半参数检验,这些技术被称为 PB 变换。PB 变换有两个优点:1. PB 变换后的数据将具有标量方差-协方差矩阵。2. 原始的 H-T 问题将被简化为一个等效的单样本 H-T 问题。然后可以通过单样本学生 t 检验或 Wilcoxon 符号秩检验来处理转换后的问题。
在模拟研究中,在所研究的正态和双指数分布下,所提出的方法优于常用的替代方法。特别是,PB 变换的 t 检验产生了比加权 LMER 检验明显更好的结果,尤其是在高相关性的情况下,只需要很少的计算成本(3 与 933 秒相比)。我们将这两种方法应用于一组在乳腺癌研究中收集的 RNA-seq 基因表达数据。通路分析表明,与加权 LMER 检验相比,PB 变换的 t 检验在与乳腺癌相关的方面揭示了更多的生物学相关发现。
作为加权 LMER 检验的快速且数值稳定的替代品,PB 变换检验特别适用于包括独立样本和匹配/重复样本的“杂乱”高通量数据。通过使用我们的方法,从业者不再需要在使用部分数据(仅对匹配样本应用配对检验)或忽略数据中的相关性(对具有一些相关样本的数据应用两样本检验)之间做出选择。我们的方法实现为一个 R 包“PBtest”,可在 https://github.com/yunzhang813/PBtest-R-Package 上获得。