Zhu Lingxue, Lei Jing, Devlin Bernie, Roeder Kathryn
Department of Statistics, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA.
Department of Psychiatry and Human Genetics, University of Pittsburgh School of Medicine, 3811 O'Hara Street, Pittsburgh, Pennsylvania 15213, USA.
Ann Appl Stat. 2017 Sep;11(3):1810-1831. doi: 10.1214/17-AOAS1062. Epub 2017 Oct 5.
Scientists routinely compare gene expression levels in cases versus controls in part to determine genes associated with a disease. Similarly, detecting case-control differences in co-expression among genes can be critical to understanding complex human diseases; however statistical methods have been limited by the high dimensional nature of this problem. In this paper, we construct a sparse-Leading-Eigenvalue-Driven (sLED) test for comparing two high-dimensional covariance matrices. By focusing on the spectrum of the differential matrix, sLED provides a novel perspective that accommodates what we assume to be common, namely sparse and weak signals in gene expression data, and it is closely related with Sparse Principal Component Analysis. We prove that sLED achieves full power asymptotically under mild assumptions, and simulation studies verify that it outperforms other existing procedures under many biologically plausible scenarios. Applying sLED to the largest gene-expression dataset obtained from post-mortem brain tissue from Schizophrenia patients and controls, we provide a novel list of genes implicated in Schizophrenia and reveal intriguing patterns in gene co-expression change for Schizophrenia subjects. We also illustrate that sLED can be generalized to compare other gene-gene "relationship" matrices that are of practical interest, such as the weighted adjacency matrices.
科学家经常比较病例组与对照组的基因表达水平,部分目的是确定与疾病相关的基因。同样,检测基因间共表达的病例对照差异对于理解复杂的人类疾病可能至关重要;然而,统计方法一直受限于该问题的高维性质。在本文中,我们构建了一种稀疏主导特征值驱动(sLED)检验,用于比较两个高维协方差矩阵。通过关注差异矩阵的谱,sLED提供了一种新颖的视角,它考虑到了我们认为常见的情况,即在基因表达数据中存在稀疏且微弱的信号,并且它与稀疏主成分分析密切相关。我们证明,在温和假设下,sLED渐近地达到完全功效,模拟研究验证了在许多生物学上合理的情况下,它优于其他现有方法。将sLED应用于从精神分裂症患者和对照组的死后脑组织获得的最大基因表达数据集,我们提供了一份与精神分裂症相关的新基因列表,并揭示了精神分裂症患者基因共表达变化中有趣的模式。我们还表明,sLED可以推广到比较其他具有实际意义的基因 - 基因“关系”矩阵,如加权邻接矩阵。