Department of Biostatistics, Columbia University, New York, NY 10032, USA; Department of Biostatistics, City University of Hong Kong, Hong Kong SAR, China; School of Data Science, City University of Hong Kong, Hong Kong SAR, China.
Department of Biostatistics, Columbia University, New York, NY 10032, USA.
Am J Hum Genet. 2022 Oct 6;109(10):1761-1776. doi: 10.1016/j.ajhg.2022.08.013. Epub 2022 Sep 22.
Family-based designs can eliminate confounding due to population substructure and can distinguish direct from indirect genetic effects, but these designs are underpowered due to limited sample sizes. Here, we propose KnockoffTrio, a statistical method to identify putative causal genetic variants for father-mother-child trio design built upon a recently developed knockoff framework in statistics. KnockoffTrio controls the false discovery rate (FDR) in the presence of arbitrary correlations among tests and is less conservative and thus more powerful than the conventional methods that control the family-wise error rate via Bonferroni correction. Furthermore, KnockoffTrio is not restricted to family-based association tests and can be used in conjunction with more powerful, potentially nonlinear models to improve the power of standard family-based tests. We show, using empirical simulations, that KnockoffTrio can prioritize causal variants over associations due to linkage disequilibrium and can provide protection against confounding due to population stratification. In applications to 14,200 trios from three study cohorts for autism spectrum disorders (ASDs), including AGP, SPARK, and SSC, we show that KnockoffTrio can identify multiple significant associations that are missed by conventional tests applied to the same data. In particular, we replicate known ASD association signals with variants in several genes such as MACROD2, NRXN1, PRKAR1B, CADM2, PCDH9, and DOCK4 and identify additional associations with variants in other genes including ARHGEF10, SLC28A1, ZNF589, and HINT1 at FDR 10%.
基于家系的设计可以消除由于群体亚结构引起的混杂,并可以区分直接和间接的遗传效应,但由于样本量有限,这些设计的功效不足。在这里,我们提出了 KnockoffTrio,这是一种基于统计学中最近开发的 Knockoff 框架的用于父亲-母亲-孩子三联体设计的潜在因果遗传变异识别的统计方法。KnockoffTrio 在存在任意测试间相关性的情况下控制假发现率(FDR),并且比通过 Bonferroni 校正控制总体错误率的传统方法更不保守,因此更有效。此外,KnockoffTrio 不仅限于基于家系的关联测试,并且可以与更强大的、潜在的非线性模型结合使用,以提高标准基于家系的测试的功效。我们使用经验模拟表明,KnockoffTrio 可以优先考虑因果变异,而不是由于连锁不平衡引起的关联,并且可以提供针对由于群体分层引起的混杂的保护。在对来自三个自闭症谱系障碍(ASD)研究队列的 14200 个三联体的应用中,包括 AGP、SPARK 和 SSC,我们表明 KnockoffTrio 可以识别出常规测试应用于相同数据时错过的多个显著关联。特别是,我们复制了已知的 ASD 关联信号,这些信号与 MACROD2、NRXN1、PRKAR1B、CADM2、PCDH9 和 DOCK4 等几个基因中的变异有关,并确定了其他与 ARHGEF10、SLC28A1、ZNF589 和 HINT1 等其他基因中的变异相关的关联,FDR 为 10%。