Division of Health Statistics, School of Public Health, Hebei Medical University, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China.
Hebei Key Laboratory of Environment and Human Health, 361 East Zhongshan Road, Shijiazhuang, Hebei 050017, P.R. China.
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae456.
Set-based association analysis is a valuable tool in studying the etiology of complex diseases in genome-wide association studies, as it allows for the joint testing of variants in a region or group. Two common types of single nucleotide polymorphism (SNP)-disease functional models are recognized when evaluating the joint function of a set of SNP: the cumulative weak signal model, in which multiple functional variants with small effects contribute to disease risk, and the dominating strong signal model, in which a few functional variants with large effects contribute to disease risk. However, existing methods have two main limitations that reduce their power. Firstly, they typically only consider one disease-SNP association model, which can result in significant power loss if the model is misspecified. Secondly, they do not account for the high-dimensional nature of SNPs, leading to low power or high false positives. In this study, we propose a solution to these challenges by using a high-dimensional inference procedure that involves simultaneously fitting many SNPs in a regression model. We also propose an omnibus testing procedure that employs a robust and powerful P-value combination method to enhance the power of SNP-set association. Our results from extensive simulation studies and a real data analysis demonstrate that our set-based high-dimensional inference strategy is both flexible and computationally efficient and can substantially improve the power of SNP-set association analysis. Application to a real dataset further demonstrates the utility of the testing strategy.
基于集合的关联分析是全基因组关联研究中研究复杂疾病病因的一种有价值的工具,因为它允许在一个区域或一组中联合测试变体。在评估一组 SNP 的联合功能时,有两种常见的单核苷酸多态性(SNP)疾病功能模型:累积弱信号模型,其中多个具有小效应的功能变体有助于疾病风险,以及主导强信号模型,其中少数具有大效应的功能变体有助于疾病风险。然而,现有的方法有两个主要的局限性,降低了它们的能力。首先,它们通常只考虑一种疾病 SNP 关联模型,如果模型指定不当,可能会导致显著的能力损失。其次,它们没有考虑 SNP 的高维性质,导致低能力或高假阳性。在这项研究中,我们通过使用涉及同时在回归模型中拟合许多 SNP 的高维推理程序来解决这些挑战。我们还提出了一种全面的测试程序,采用稳健而强大的 P 值组合方法来提高 SNP 集合关联的能力。我们通过广泛的模拟研究和真实数据分析的结果表明,我们基于集合的高维推理策略既灵活又计算高效,可以大大提高 SNP 集合关联分析的能力。对真实数据集的应用进一步证明了测试策略的实用性。