Ouyang Weiwei, An Qiang, Zhao Jinying, Qin Huaizhen
Department of Global Biostatistics and Data Science, Tulane University School of Public Health and Tropical Medicine, 1440 Canal Street, Suite 2001, New Orleans, LA, 70112, USA.
Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, 1440 Canal Street, New Orleans, LA, 70112, USA.
BMC Bioinformatics. 2016 Dec 6;17(1):497. doi: 10.1186/s12859-016-1393-y.
In functional genomics studies, tests on mean heterogeneity have been widely employed to identify differentially expressed genes with distinct mean expression levels under different experimental conditions. Variance heterogeneity (aka, the difference between condition-specific variances) of gene expression levels is simply neglected or calibrated for as an impediment. The mean heterogeneity in the expression level of a gene reflects one aspect of its distribution alteration; and variance heterogeneity induced by condition change may reflect another aspect. Change in condition may alter both mean and some higher-order characteristics of the distributions of expression levels of susceptible genes.
In this report, we put forth a conception of mean-variance differentially expressed (MVDE) genes, whose expression means and variances are sensitive to the change in experimental condition. We mathematically proved the null independence of existent mean heterogeneity tests and variance heterogeneity tests. Based on the independence, we proposed an integrative mean-variance test (IMVT) to combine gene-wise mean heterogeneity and variance heterogeneity induced by condition change. The IMVT outperformed its competitors under comprehensive simulations of normality and Laplace settings. For moderate samples, the IMVT well controlled type I error rates, and so did existent mean heterogeneity test (i.e., the Welch t test (WT), the moderated Welch t test (MWT)) and the procedure of separate tests on mean and variance heterogeneities (SMVT), but the likelihood ratio test (LRT) severely inflated type I error rates. In presence of variance heterogeneity, the IMVT appeared noticeably more powerful than all the valid mean heterogeneity tests. Application to the gene profiles of peripheral circulating B raised solid evidence of informative variance heterogeneity. After adjusting for background data structure, the IMVT replicated previous discoveries and identified novel experiment-wide significant MVDE genes.
Our results indicate tremendous potential gain of integrating informative variance heterogeneity after adjusting for global confounders and background data structure. The proposed informative integration test better summarizes the impacts of condition change on expression distributions of susceptible genes than do the existent competitors. Therefore, particular attention should be paid to explicitly exploit the variance heterogeneity induced by condition change in functional genomics analysis.
在功能基因组学研究中,对均值异质性的检验已被广泛用于识别在不同实验条件下具有不同平均表达水平的差异表达基因。基因表达水平的方差异质性(即特定条件方差之间的差异)却被简单地忽略或作为一种干扰因素进行校准。基因表达水平的均值异质性反映了其分布变化的一个方面;而条件变化引起的方差异质性可能反映了另一个方面。条件的改变可能会改变易感基因表达水平分布的均值和一些高阶特征。
在本报告中,我们提出了均值 - 方差差异表达(MVDE)基因的概念,其表达均值和方差对实验条件的变化敏感。我们从数学上证明了现有均值异质性检验和方差异质性检验的零独立性。基于这种独立性,我们提出了一种综合均值 - 方差检验(IMVT),以结合由条件变化引起的基因层面的均值异质性和方差异质性。在正态性和拉普拉斯设置的综合模拟下,IMVT优于其竞争对手。对于中等样本量,IMVT能很好地控制I型错误率,现有的均值异质性检验(即 Welch t检验(WT)、适度 Welch t检验(MWT))以及均值和方差异质性的单独检验程序(SMVT)也是如此,但似然比检验(LRT)严重夸大了I型错误率。在存在方差异质性的情况下,IMVT明显比所有有效的均值异质性检验更具功效。将其应用于外周循环B细胞的基因谱,有力地证明了信息性方差异质性的存在。在调整背景数据结构后,IMVT重复了之前发现的结果,并识别出了全实验范围内新的显著MVDE基因。
我们的结果表明,在调整全局混杂因素和背景数据结构后,整合信息性方差异质性具有巨大的潜在收益。所提出的信息性整合检验比现有的竞争对手能更好地总结条件变化对易感基因表达分布的影响。因此,在功能基因组学分析中应特别注意明确利用由条件变化引起的方差异质性。