Suppr超能文献

英国生物银行全外显子组序列双表型分析与稳健基于区域的罕见变异测试。

UK Biobank Whole-Exome Sequence Binary Phenome Analysis with Robust Region-Based Rare-Variant Test.

机构信息

Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA.

Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.

出版信息

Am J Hum Genet. 2020 Jan 2;106(1):3-12. doi: 10.1016/j.ajhg.2019.11.012. Epub 2019 Dec 19.

Abstract

In biobank data analysis, most binary phenotypes have unbalanced case-control ratios, and this can cause inflation of type I error rates. Recently, a saddle point approximation (SPA) based single-variant test has been developed to provide an accurate and scalable method to test for associations of such phenotypes. For gene- or region-based multiple-variant tests, a few methods exist that can adjust for unbalanced case-control ratios; however, these methods are either less accurate when case-control ratios are extremely unbalanced or not scalable for large data analyses. To address these problems, we propose SKAT- and SKAT-O- type region-based tests; in these tests, the single-variant score statistic is calibrated based on SPA and efficient resampling (ER). Through simulation studies, we show that the proposed method provides well-calibrated p values. In contrast, when the case-control ratio is 1:99, the unadjusted approach has greatly inflated type I error rates (90 times that of exome-wide sequencing α = 2.5 × 10). Additionally, the proposed method has similar computation time to the unadjusted approaches and is scalable for large sample data. In our application, the UK Biobank whole-exome sequence data analysis of 45,596 unrelated European samples and 791 PheCode phenotypes identified 10 rare-variant associations with p value < 10, including the associations between JAK2 and myeloproliferative disease, HOXB13 and cancer of prostate, and F11 and congenital coagulation defects. All analysis summary results are publicly available through a web-based visual server, and this availability can help facilitate the identification of the genetic basis of complex diseases.

摘要

在生物库数据分析中,大多数二元表型的病例对照比例不平衡,这会导致 I 型错误率膨胀。最近,开发了一种基于鞍点逼近(SPA)的单变量检验方法,为检验此类表型的相关性提供了一种准确和可扩展的方法。对于基于基因或区域的多变量检验,有几种方法可以调整不平衡的病例对照比例;然而,当病例对照比例极不平衡时,这些方法要么不太准确,要么对于大型数据分析不可扩展。为了解决这些问题,我们提出了基于 SKAT 和 SKAT-O 的基于区域的检验方法;在这些检验中,单变量得分统计量是基于 SPA 和有效的重采样(ER)校准的。通过模拟研究,我们表明,所提出的方法提供了校准良好的 p 值。相比之下,当病例对照比例为 1:99 时,未经调整的方法大大增加了 I 型错误率(比全外显子测序α = 2.5×10 高出 90 倍)。此外,所提出的方法与未经调整的方法具有相似的计算时间,并且可扩展到大型样本数据。在我们的应用中,对 45596 个无关欧洲样本和 791 个 PheCode 表型的英国生物库全外显子序列数据分析确定了 10 个与 p 值<10 相关的罕见变异关联,包括 JAK2 与骨髓增生性疾病、HOXB13 与前列腺癌和 F11 与先天性凝血缺陷的关联。所有分析总结结果都通过基于网络的可视化服务器公开提供,这种可用性有助于促进复杂疾病遗传基础的鉴定。

相似文献

1
UK Biobank Whole-Exome Sequence Binary Phenome Analysis with Robust Region-Based Rare-Variant Test.
Am J Hum Genet. 2020 Jan 2;106(1):3-12. doi: 10.1016/j.ajhg.2019.11.012. Epub 2019 Dec 19.
2
A Fast and Accurate Method for Genome-wide Scale Phenome-wide G × E Analysis and Its Application to UK Biobank.
Am J Hum Genet. 2019 Dec 5;105(6):1182-1192. doi: 10.1016/j.ajhg.2019.10.008. Epub 2019 Nov 14.
3
A Fast and Accurate Algorithm to Test for Binary Phenotypes and Its Application to PheWAS.
Am J Hum Genet. 2017 Jul 6;101(1):37-49. doi: 10.1016/j.ajhg.2017.05.014. Epub 2017 Jun 8.
4
Robust meta-analysis of biobank-based genome-wide association studies with unbalanced binary phenotypes.
Genet Epidemiol. 2019 Jul;43(5):462-476. doi: 10.1002/gepi.22197. Epub 2019 Feb 22.
6
A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank.
Am J Hum Genet. 2020 Aug 6;107(2):222-233. doi: 10.1016/j.ajhg.2020.06.003. Epub 2020 Jun 25.
7
Rare variant contribution to human disease in 281,104 UK Biobank exomes.
Nature. 2021 Sep;597(7877):527-532. doi: 10.1038/s41586-021-03855-y. Epub 2021 Aug 10.
8
A data-driven approach for studying the role of body mass in multiple diseases: a phenome-wide registry-based case-control study in the UK Biobank.
Lancet Digit Health. 2019 Jul;1(3):e116-e126. doi: 10.1016/S2589-7500(19)30028-7. Epub 2019 Jun 27.
9
Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies.
Nat Genet. 2018 Sep;50(9):1335-1341. doi: 10.1038/s41588-018-0184-y. Epub 2018 Aug 13.
10
Integration of variant annotations using deep set networks boosts rare variant association testing.
Nat Genet. 2024 Oct;56(10):2271-2280. doi: 10.1038/s41588-024-01919-z. Epub 2024 Sep 25.

引用本文的文献

1
Established Cancer Predisposition Genes in Single and Multiple Cancer Diagnoses.
JAMA Oncol. 2025 Aug 28. doi: 10.1001/jamaoncol.2025.2879.
3
Sequencing in over 50,000 cases identifies coding and structural variation underlying atrial fibrillation risk.
Nat Genet. 2025 Mar;57(3):548-562. doi: 10.1038/s41588-025-02074-9. Epub 2025 Mar 6.
4
Empowering genome-wide association studies via a visualizable test based on the regional association score.
Proc Natl Acad Sci U S A. 2025 Mar 4;122(9):e2419721122. doi: 10.1073/pnas.2419721122. Epub 2025 Feb 25.
6
PWAS Hub: exploring gene-based associations of complex diseases with sex dependency.
Nucleic Acids Res. 2025 Jan 6;53(D1):D1132-D1143. doi: 10.1093/nar/gkae1125.
7
PWAS Hub for exploring gene-based associations of common complex diseases.
Genome Res. 2024 Oct 29;34(10):1674-1686. doi: 10.1101/gr.278916.123.
8
Rare coding variant analysis for human diseases across biobanks and ancestries.
Nat Genet. 2024 Sep;56(9):1811-1820. doi: 10.1038/s41588-024-01894-5. Epub 2024 Aug 29.
10

本文引用的文献

1
Exome sequencing and characterization of 49,960 individuals in the UK Biobank.
Nature. 2020 Oct;586(7831):749-756. doi: 10.1038/s41586-020-2853-0. Epub 2020 Oct 21.
2
Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts.
Nat Genet. 2020 Jun;52(6):634-639. doi: 10.1038/s41588-020-0621-6. Epub 2020 May 18.
3
A Fast and Accurate Method for Genome-wide Scale Phenome-wide G × E Analysis and Its Application to UK Biobank.
Am J Hum Genet. 2019 Dec 5;105(6):1182-1192. doi: 10.1016/j.ajhg.2019.10.008. Epub 2019 Nov 14.
5
Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies.
Am J Hum Genet. 2019 Feb 7;104(2):260-274. doi: 10.1016/j.ajhg.2018.12.012. Epub 2019 Jan 10.
6
The UK Biobank resource with deep phenotyping and genomic data.
Nature. 2018 Oct;562(7726):203-209. doi: 10.1038/s41586-018-0579-z. Epub 2018 Oct 10.
8
Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies.
Nat Genet. 2018 Sep;50(9):1335-1341. doi: 10.1038/s41588-018-0184-y. Epub 2018 Aug 13.
9
Genome-wide analyses using UK Biobank data provide insights into the genetic architecture of osteoarthritis.
Nat Genet. 2018 Apr;50(4):549-558. doi: 10.1038/s41588-018-0079-y. Epub 2018 Mar 20.
10
A Fast and Accurate Algorithm to Test for Binary Phenotypes and Its Application to PheWAS.
Am J Hum Genet. 2017 Jul 6;101(1):37-49. doi: 10.1016/j.ajhg.2017.05.014. Epub 2017 Jun 8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验