Hecker Julian, Townes F William, Kachroo Priyadarshini, Laurie Cecelia, Lasky-Su Jessica, Ziniti John, Cho Michael H, Weiss Scott T, Laird Nan M, Lange Christoph
Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
Department of Computer Science, Princeton University, Princeton, NJ 08540-5233, USA.
Bioinformatics. 2021 Apr 1;36(22-23):5432-5438. doi: 10.1093/bioinformatics/btaa1055.
Analysis of rare variants in family-based studies remains a challenge. Transmission-based approaches provide robustness against population stratification, but the evaluation of the significance of test statistics based on asymptotic theory can be imprecise. Also, power will depend heavily on the choice of the test statistic and on the underlying genetic architecture of the locus, which will be generally unknown.
In our proposed framework, we utilize the FBAT haplotype algorithm to obtain the conditional offspring genotype distribution under the null hypothesis given the sufficient statistic. Based on this conditional offspring genotype distribution, the significance of virtually any association test statistic can be evaluated based on simulations or exact computations, without the need for asymptotic approximations. Besides standard linear burden-type statistics, this enables our approach to also evaluate other test statistics such as variance components statistics, higher criticism approaches, and maximum-single-variant-statistics, where asymptotic theory might be involved or does not provide accurate approximations for rare variant data. Based on these P-values, combined test statistics such as the aggregated Cauchy association test (ACAT) can also be utilized. In simulation studies, we show that our framework outperforms existing approaches for family-based studies in several scenarios. We also applied our methodology to a TOPMed whole-genome sequencing dataset with 897 asthmatic trios from Costa Rica.
FBAT software is available at https://sites.google.com/view/fbatwebpage. Simulation code is available at https://github.com/julianhecker/FBAT_rare_variant_test_simulations. Whole-genome sequencing data for 'NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica' is available at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000988.v4.p1.
Supplementary data are available at Bioinformatics online.
在基于家系的研究中分析罕见变异仍然是一项挑战。基于传递的方法对群体分层具有稳健性,但基于渐近理论对检验统计量的显著性评估可能不准确。此外,检验效能将严重依赖于检验统计量的选择以及位点的潜在遗传结构,而这些通常是未知的。
在我们提出的框架中,我们利用FBAT单倍型算法在给定充分统计量的零假设下获得条件后代基因型分布。基于这种条件后代基因型分布,几乎任何关联检验统计量的显著性都可以通过模拟或精确计算来评估,而无需渐近近似。除了标准的线性负担型统计量外,这还使我们的方法能够评估其他检验统计量,如方差成分统计量、高阶批评方法和最大单变异统计量,对于这些统计量,渐近理论可能适用或对于罕见变异数据不能提供准确的近似。基于这些P值,还可以使用诸如聚合柯西关联检验(ACAT)等组合检验统计量。在模拟研究中,我们表明我们的框架在几种情况下优于现有的基于家系的研究方法。我们还将我们的方法应用于来自哥斯达黎加的897个哮喘三联体的TOPMed全基因组测序数据集。
FBAT软件可在https://sites.google.com/view/fbatwebpage获取。模拟代码可在https://github.com/julianhecker/FBAT_rare_variant_test_simulations获取。“NHLBI TOPMed:哥斯达黎加哮喘的遗传流行病学”的全基因组测序数据可在https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000988.v4.p1获取。
补充数据可在《生物信息学》在线获取。