Department of Biostatistics, University of Florida, Gainesville, FL, USA.
Stat Med. 2012 Apr 13;31(8):724-42. doi: 10.1002/sim.4435. Epub 2011 Dec 9.
High-throughput technology in metabolomics, genomics, and proteomics gives rise to high dimension, low sample size data when the number of metabolites, genes, or proteins exceeds the sample size. For a limited class of designs, the classic 'univariate approach' for Gaussian repeated measures can provide a reasonable global hypothesis test. We derive new tests that not only accurately allow more variables than subjects, but also give valid analyses for data with complex between-subject and within-subject designs. Our derivations capitalize on the dual of the error covariance matrix, which is nonsingular when the number of variables exceeds the sample size, to ensure correct statistical inference and enhance computational efficiency. Simulation studies demonstrate that the new tests accurately control Type I error rate and have reasonable power even with a handful of subjects and a thousand outcome variables. We apply the new methods to the study of metabolic consequences of vitamin B6 deficiency. Free software implementing the new methods applies to a wide range of designs, including one group pre-intervention and post-intervention comparisons, multiple parallel group comparisons with one-way or factorial designs, and the adjustment and evaluation of covariate effects.
代谢组学、基因组学和蛋白质组学中的高通量技术在代谢物、基因或蛋白质的数量超过样本量时会产生高维、小样本量的数据。对于有限的一类设计,经典的“高斯重复测量的单变量方法”可以提供合理的全局假设检验。我们推导出了新的检验方法,这些方法不仅可以准确地允许比受试者更多的变量,而且还可以对具有复杂受试者内和受试者间设计的数据进行有效的分析。我们的推导利用了误差协方差矩阵的对偶,当变量数量超过样本量时,它是非奇异的,以确保正确的统计推断并提高计算效率。模拟研究表明,即使只有少数几个受试者和一千个结果变量,新的检验方法也能准确地控制Ⅰ型错误率,并具有合理的功效。我们将新方法应用于维生素 B6 缺乏症的代谢后果研究。实现新方法的免费软件适用于广泛的设计,包括一组干预前和干预后的比较、具有单向或析因设计的多个平行组比较,以及协变量效应的调整和评估。