基于家系设计的罕见变异关联测试的统一框架，包括高等批评方法、序列核关联检验（SKATs）和负担检验。

A unifying framework for rare variant association testing in family-based designs, including higher criticism approaches, SKATs, and burden tests.

作者信息

Hecker Julian, Townes F William, Kachroo Priyadarshini, Laurie Cecelia, Lasky-Su Jessica, Ziniti John, Cho Michael H, Weiss Scott T, Laird Nan M, Lange Christoph

机构信息

Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.

Department of Computer Science, Princeton University, Princeton, NJ 08540-5233, USA.

出版信息

Bioinformatics. 2021 Apr 1;36(22-23):5432-5438. doi: 10.1093/bioinformatics/btaa1055.

DOI:10.1093/bioinformatics/btaa1055

PMID:33367522

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8016468/

Abstract

MOTIVATION

Analysis of rare variants in family-based studies remains a challenge. Transmission-based approaches provide robustness against population stratification, but the evaluation of the significance of test statistics based on asymptotic theory can be imprecise. Also, power will depend heavily on the choice of the test statistic and on the underlying genetic architecture of the locus, which will be generally unknown.

RESULTS

In our proposed framework, we utilize the FBAT haplotype algorithm to obtain the conditional offspring genotype distribution under the null hypothesis given the sufficient statistic. Based on this conditional offspring genotype distribution, the significance of virtually any association test statistic can be evaluated based on simulations or exact computations, without the need for asymptotic approximations. Besides standard linear burden-type statistics, this enables our approach to also evaluate other test statistics such as variance components statistics, higher criticism approaches, and maximum-single-variant-statistics, where asymptotic theory might be involved or does not provide accurate approximations for rare variant data. Based on these P-values, combined test statistics such as the aggregated Cauchy association test (ACAT) can also be utilized. In simulation studies, we show that our framework outperforms existing approaches for family-based studies in several scenarios. We also applied our methodology to a TOPMed whole-genome sequencing dataset with 897 asthmatic trios from Costa Rica.

AVAILABILITY AND IMPLEMENTATION

FBAT software is available at https://sites.google.com/view/fbatwebpage. Simulation code is available at https://github.com/julianhecker/FBAT_rare_variant_test_simulations. Whole-genome sequencing data for 'NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica' is available at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000988.v4.p1.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在基于家系的研究中分析罕见变异仍然是一项挑战。基于传递的方法对群体分层具有稳健性，但基于渐近理论对检验统计量的显著性评估可能不准确。此外，检验效能将严重依赖于检验统计量的选择以及位点的潜在遗传结构，而这些通常是未知的。

结果

在我们提出的框架中，我们利用FBAT单倍型算法在给定充分统计量的零假设下获得条件后代基因型分布。基于这种条件后代基因型分布，几乎任何关联检验统计量的显著性都可以通过模拟或精确计算来评估，而无需渐近近似。除了标准的线性负担型统计量外，这还使我们的方法能够评估其他检验统计量，如方差成分统计量、高阶批评方法和最大单变异统计量，对于这些统计量，渐近理论可能适用或对于罕见变异数据不能提供准确的近似。基于这些P值，还可以使用诸如聚合柯西关联检验（ACAT）等组合检验统计量。在模拟研究中，我们表明我们的框架在几种情况下优于现有的基于家系的研究方法。我们还将我们的方法应用于来自哥斯达黎加的897个哮喘三联体的TOPMed全基因组测序数据集。

可用性和实现

FBAT软件可在https://sites.google.com/view/fbatwebpage获取。模拟代码可在https://github.com/julianhecker/FBAT_rare_variant_test_simulations获取。“NHLBI TOPMed：哥斯达黎加哮喘的遗传流行病学”的全基因组测序数据可在https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000988.v4.p1获取。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

A unifying framework for rare variant association testing in family-based designs, including higher criticism approaches, SKATs, and burden tests.基于家系设计的罕见变异关联测试的统一框架，包括高等批评方法、序列核关联检验（SKATs）和负担检验。

Bioinformatics. 2021 Apr 1;36(22-23):5432-5438. doi: 10.1093/bioinformatics/btaa1055.

A haplotype-based framework for group-wise transmission/disequilibrium tests for rare variant association analysis.一种基于单倍型的框架，用于罕见变异关联分析的分组传递/不平衡检验。

Bioinformatics. 2015 May 1;31(9):1452-9. doi: 10.1093/bioinformatics/btu860. Epub 2015 Jan 6.

The power of TOPMed imputation for the discovery of Latino-enriched rare variants associated with type 2 diabetes.TOPMed 插补在发现与 2 型糖尿病相关的拉丁裔丰富罕见变异中的作用。

Diabetologia. 2023 Jul;66(7):1273-1288. doi: 10.1007/s00125-023-05912-9. Epub 2023 May 6.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Powerful and efficient SNP-set association tests across multiple phenotypes using GWAS summary data.利用 GWAS 汇总数据对多种表型进行强大且高效的 SNP 集关联测试。

Bioinformatics. 2019 Apr 15;35(8):1366-1372. doi: 10.1093/bioinformatics/bty811.

Reconsidering association testing methods using single-variant test statistics as alternatives to pooling tests for sequence data with rare variants.重新考虑使用单变量检验统计量作为合并检验的替代方法，用于具有罕见变异的序列数据的关联检验方法。

PLoS One. 2012;7(2):e30238. doi: 10.1371/journal.pone.0030238. Epub 2012 Feb 17.

A comparative analysis of family-based and population-based association tests using whole genome sequence data.使用全基因组序列数据对基于家系和基于群体的关联测试进行比较分析。

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S33. doi: 10.1186/1753-6561-8-S1-S33. eCollection 2014.

A Zoom-Focus algorithm (ZFA) to locate the optimal testing region for rare variant association tests.一种用于定位罕见变异关联测试最佳测试区域的变焦聚焦算法（ZFA）。

Bioinformatics. 2017 Aug 1;33(15):2330-2336. doi: 10.1093/bioinformatics/btx130.

Fast and compact matching statistics analytics.快速且紧凑的匹配统计分析。

Bioinformatics. 2022 Mar 28;38(7):1838-1845. doi: 10.1093/bioinformatics/btac064.

The eigen higher criticism and eigen Berk-Jones tests for multiple trait association studies based on GWAS summary statistics.基于 GWAS 汇总统计数据的多性状关联研究的特征高阶批评和特征 Berk-Jones 检验。

Genet Epidemiol. 2022 Mar;46(2):89-104. doi: 10.1002/gepi.22439. Epub 2021 Nov 22.

引用本文的文献

RetroFun-RVS: A Retrospective Family-Based Framework for Rare Variant Analysis Incorporating Functional Annotations.RetroFun-RVS：一种基于回顾性家系的罕见变异分析框架，纳入了功能注释。

Genet Epidemiol. 2025 Mar;49(2):e70001. doi: 10.1002/gepi.70001.

Recent advances and challenges of rare variant association analysis in the biobank sequencing era.生物样本库测序时代罕见变异关联分析的最新进展与挑战

Front Genet. 2022 Oct 6;13:1014947. doi: 10.3389/fgene.2022.1014947. eCollection 2022.

FGF20 and PGM2 variants are associated with childhood asthma in family-based whole-genome sequencing studies.FGF20 和 PGM2 变异与基于家系的全基因组测序研究中的儿童哮喘有关。

Hum Mol Genet. 2023 Jan 27;32(4):696-707. doi: 10.1093/hmg/ddac258.

Benchmarking statistical methods for analyzing parent-child dyads in genetic association studies.用于分析遗传关联研究中亲子对子的统计方法的基准测试。

Genet Epidemiol. 2022 Jul;46(5-6):266-284. doi: 10.1002/gepi.22453. Epub 2022 Apr 22.

Focused Strategies for Defining the Genetic Architecture of Congenital Heart Defects.聚焦于先天性心脏病遗传结构定义的策略。

Genes (Basel). 2021 May 28;12(6):827. doi: 10.3390/genes12060827.

本文引用的文献

Controlling for human population stratification in rare variant association studies.控制罕见变异关联研究中的人类群体分层。

Sci Rep. 2021 Sep 24;11(1):19015. doi: 10.1038/s41598-021-98370-5.

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.美国国立卫生研究院生物医学高级研究与发展局（NHLBI）TOPMed 项目中对 53831 个不同基因组进行测序。

Nature. 2021 Feb;590(7845):290-299. doi: 10.1038/s41586-021-03205-y. Epub 2021 Feb 10.

Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale.大规模全基因组测序研究中通过多种计算功能注释的动态整合增强罕见变异关联分析。

Nat Genet. 2020 Sep;52(9):969-983. doi: 10.1038/s41588-020-0676-4. Epub 2020 Aug 24.

Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts.基于区域的大型生物库和队列关联检验的可扩展广义线性混合模型。

Nat Genet. 2020 Jun;52(6):634-639. doi: 10.1038/s41588-020-0621-6. Epub 2020 May 18.

On rare variants in principal component analysis of population stratification.关于群体分层主成分分析中的罕见变异。

BMC Genet. 2020 Mar 17;21(1):34. doi: 10.1186/s12863-020-0833-x.

A flexible and nearly optimal sequential testing approach to randomized testing: QUICK-STOP.一种灵活且近乎最优的随机测试序贯测试方法：QUICK-STOP。

Genet Epidemiol. 2020 Mar;44(2):139-147. doi: 10.1002/gepi.22268. Epub 2019 Nov 11.

A genome-wide scan statistic framework for whole-genome sequence data analysis.全基因组序列数据分析的全基因组扫描统计框架。

Nat Commun. 2019 Jul 9;10(1):3018. doi: 10.1038/s41467-019-11023-0.

Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole-Genome Sequencing Studies.全基因组测序研究中稀有变异关联区域的动态扫描程序。

Am J Hum Genet. 2019 May 2;104(5):802-814. doi: 10.1016/j.ajhg.2019.03.002. Epub 2019 Apr 12.

ACAT: A Fast and Powerful p Value Combination Method for Rare-Variant Analysis in Sequencing Studies.ACAT：一种用于测序研究中罕见变异分析的快速而强大的 p 值组合方法。

Am J Hum Genet. 2019 Mar 7;104(3):410-421. doi: 10.1016/j.ajhg.2019.01.002.

A comparison of popular TDT-generalizations for family-based association analysis.用于基于家系的关联分析的流行TDT概括方法比较。

Genet Epidemiol. 2019 Apr;43(3):300-317. doi: 10.1002/gepi.22181. Epub 2019 Jan 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验