Nandy Debmalya, Ghosh Debashis, Kechris Katerina
Department of Biostatistics & Informatics, Colorado School of Public Heath, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.
Center for Innovative Design & Analysis, Department of Biostatistics & Informatics, Colorado School of Public Heath, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.
Metabolites. 2025 Jan 8;15(1):28. doi: 10.3390/metabo15010028.
Due to scientific advancements in high-throughput data production technologies, omics studies, such as genomics and metabolomics, often give rise to numerous measurements per sample/subject containing several noisy variables that potentially cloud the true signals relevant to the desired study outcome(s). Therefore, correcting for multiple testing is critical while performing any statistical test of significance to minimize the chances of false or missed discoveries. Such correction practice is commonplace in genome-wide association studies (GWAS) but is also becoming increasingly relevant to metabolome-wide association studies (MWAS). However, many existing procedures may be too conservative or too lenient, only assume a linear association between the features, or have not been evaluated on metabolomics data.
One such multiple testing correction strategy is to estimate the number of statistically independent tests, called the , based on the eigen-analysis of the correlation matrix between the features. This effective number is then used for a subsequent single-step adjustment to obtain the pointwise significance level. We propose a modification to the -value adjustment based on a more general measure of association between two predictors, the , with a specific focus on MWAS.
We assessed common GWAS -value adjustment procedures and one tailored for MWAS, which rely on eigen-analysis of the Pearson's correlation matrix. Our study, including varying sample size-to-feature ratios, response types, and metabolite groupings, highlights the superior performance of the distance correlation.
We propose the distance-correlation-based -value adjustment (DisCo P-ad) as a novel modification that can enhance existing eigen-analysis-based multiple testing correction procedures by increasing power or reducing false positives. While our focus is on metabolomics, DisCo P-ad can also readily be applied to other high-dimensional omics studies.
由于高通量数据生产技术的科学进步,基因组学和代谢组学等组学研究通常会在每个样本/受试者中产生大量测量值,其中包含几个有噪声的变量,这些变量可能会掩盖与期望的研究结果相关的真实信号。因此,在进行任何显著性统计检验时,校正多重检验至关重要,以尽量减少错误发现或漏发现的可能性。这种校正方法在全基因组关联研究(GWAS)中很常见,但在全代谢组关联研究(MWAS)中也越来越重要。然而,许多现有方法可能过于保守或过于宽松,仅假设特征之间存在线性关联,或者尚未在代谢组学数据上进行评估。
一种这样的多重检验校正策略是基于特征之间相关矩阵的特征分析来估计统计独立检验的数量,称为 。然后,这个有效数量用于后续的单步调整,以获得逐点显著性水平。我们基于两个预测变量之间更一般的关联度量 ,对 值调整提出了一种修改,特别关注MWAS。
我们评估了常见的GWAS 值调整程序以及一种针对MWAS量身定制的程序,这些程序依赖于皮尔逊相关矩阵的特征分析。我们的研究,包括不同的样本量与特征比、响应类型和代谢物分组,突出了距离相关的优越性能。
我们提出基于距离相关的 值调整(DisCo P-ad)作为一种新颖的修改方法,它可以通过提高功效或减少假阳性来增强现有的基于特征分析的多重检验校正程序。虽然我们关注的是代谢组学,但DisCo P-ad也可以很容易地应用于其他高维组学研究。