Chen Lin S, Paul Debashis, Prentice Ross L, Wang Pei
Department of Health Studies, The University of Chicago, IL.
J Am Stat Assoc. 2011 Dec;106(496):1345-1360. doi: 10.1198/jasa.2011.ap10599.
Recent proteomic studies have identified proteins related to specific phenotypes. In addition to marginal association analysis for individual proteins, analyzing pathways (functionally related sets of proteins) may yield additional valuable insights. Identifying pathways that differ between phenotypes can be conceptualized as a multivariate hypothesis testing problem: whether the mean vector of a -dimensional random vector is . Proteins within the same biological pathway may correlate with one another in a complicated way, and type I error rates can be inflated if such correlations are incorrectly assumed to be absent. The inflation tends to be more pronounced when the sample size is very small or there is a large amount of missingness in the data, as is frequently the case in proteomic discovery studies. To tackle these challenges, we propose a regularized Hotelling's (RHT) statistic together with a non-parametric testing procedure, which effectively controls the type I error rate and maintains good power in the presence of complex correlation structures and missing data patterns. We investigate asymptotic properties of the RHT statistic under pertinent assumptions and compare the test performance with four existing methods through simulation examples. We apply the RHT test to a hormone therapy proteomics data set, and identify several interesting biological pathways for which blood serum concentrations changed following hormone therapy initiation.
最近的蛋白质组学研究已经鉴定出与特定表型相关的蛋白质。除了对单个蛋白质进行边际关联分析外,分析通路(功能相关的蛋白质组)可能会产生额外有价值的见解。识别不同表型之间存在差异的通路可被概念化为一个多变量假设检验问题:即一个(p)维随机向量(X)的均值向量(\mu)是否为(\mu_0)。同一生物通路中的蛋白质可能以复杂的方式相互关联,如果错误地假设不存在这种相关性,那么第一类错误率可能会膨胀。当样本量非常小或者数据中存在大量缺失值时,这种膨胀往往会更加明显,蛋白质组学发现研究中经常出现这种情况。为了应对这些挑战,我们提出了一种正则化的霍特林(T^2)(RHT)统计量以及一种非参数检验程序,该程序在存在复杂相关结构和缺失数据模式的情况下有效地控制了第一类错误率并保持了良好的检验功效。我们在相关假设下研究了RHT统计量的渐近性质,并通过模拟示例将检验性能与四种现有方法进行了比较。我们将RHT检验应用于一个激素治疗蛋白质组学数据集,并识别出了几个有趣的生物通路,激素治疗开始后血清浓度在这些通路上发生了变化。