全基因组关联研究中基于汇总统计量的方差分量估计统一框架

A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES.

作者信息

Zhou Xiang

机构信息

University of Michigan.

出版信息

Ann Appl Stat. 2017 Dec;11(4):2027-2051. doi: 10.1214/17-AOAS1052. Epub 2017 Dec 28.

DOI:10.1214/17-AOAS1052

PMID:29515717

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5836736/

Abstract

Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.

摘要

线性混合模型（LMMs）是基因关联研究中最常用的工具之一。然而，LMMs中估计方差分量的标准方法——限制最大似然估计法（REML）——存在几个重要缺点：REML需要研究中所有样本的个体水平基因型和表型，计算速度慢，并且在病例对照研究中会产生向下偏倚的估计值。为了弥补这些缺点，我们提出了一种用于方差分量估计的替代框架，我们称之为MQS。MQS基于矩估计法（MoM）和最小范数二次无偏估计（MINQUE）准则，并将两种看似不相关的方法——著名的哈斯曼-埃尔斯顿（HE）回归和最近的连锁不平衡分数回归（LDSC）——纳入同一个统一的统计框架。有了这个新框架，我们提供了一种替代但数学上等效的HE形式，允许使用汇总统计量。我们提供了LDSC的精确估计形式，以产生无偏且统计效率更高的估计值。我们方法的一个关键特征是，它能够将使用所有样本计算的边际分数与使用一小部分随机个体（或来自适当参考面板的个体）计算的SNP相关信息进行配对，同时能够产生几乎与使用完整数据计算这两个量时一样准确的估计值。因此，我们的方法产生无偏且统计有效的估计值，并利用汇总统计量，同时对于大数据集计算效率高。通过对8个真实数据集的37种表型进行模拟和应用，我们展示了我们的方法在人群研究中估计和划分SNP遗传力以及在家庭研究中进行遗传力估计的优势。我们的方法在GEMMA软件包中实现，可在www.xzlab.org/software.html上免费获取。

相似文献

A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES.

Ann Appl Stat. 2017 Dec;11(4):2027-2051. doi: 10.1214/17-AOAS1052. Epub 2017 Dec 28.

Accurate and Efficient Estimation of Local Heritability using Summary Statistics and LD Matrix.

bioRxiv. 2023 Mar 22:2023.02.08.527759. doi: 10.1101/2023.02.08.527759.

REHE: Fast variance components estimation for linear mixed models.

Genet Epidemiol. 2021 Dec;45(8):891-905. doi: 10.1002/gepi.22432. Epub 2021 Oct 17.

Fast heritability estimation based on MINQUE and batch training.

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac115.

Accurate genetic and environmental covariance estimation with composite likelihood in genome-wide association studies.

PLoS Genet. 2021 Jan 4;17(1):e1009293. doi: 10.1371/journal.pgen.1009293. eCollection 2021 Jan.

Estimating heritability of complex traits from genome-wide association studies using IBS-based Haseman-Elston regression.

Front Genet. 2014 Apr 30;5:107. doi: 10.3389/fgene.2014.00107. eCollection 2014.

Hybrid of Restricted and Penalized Maximum Likelihood Method for Efficient Genome-Wide Association Study.

Genes (Basel). 2020 Oct 29;11(11):1286. doi: 10.3390/genes11111286.

Is single-step genomic REML with the algorithm for proven and young more computationally efficient when less generations of data are present?

J Anim Sci. 2022 May 1;100(5). doi: 10.1093/jas/skac082.

SumVg: Total Heritability Explained by All Variants in Genome-Wide Association Studies Based on Summary Statistics with Standard Error Estimates.

Int J Mol Sci. 2024 Jan 22;25(2):1347. doi: 10.3390/ijms25021347.

Fast and Accurate Construction of Confidence Intervals for Heritability.

Am J Hum Genet. 2016 Jun 2;98(6):1181-1192. doi: 10.1016/j.ajhg.2016.04.016.

引用本文的文献

An introgression from Triticum timopheevii reduces grain protein content in winter wheat populations.

Plant Genome. 2025 Sep;18(3):e70090. doi: 10.1002/tpg2.70090.

Unravelling the genetic architecture of cerebral small vessel disease in the context of stroke.

J Cereb Blood Flow Metab. 2025 Aug 6:271678X251362977. doi: 10.1177/0271678X251362977.

Towards improved fine-mapping of candidate causal variants.

Nat Rev Genet. 2025 Jul 28. doi: 10.1038/s41576-025-00869-4.

Dissecting the genetic basis of response to salmonid alphavirus in Atlantic salmon.

BMC Genomics. 2025 Jul 11;26(1):657. doi: 10.1186/s12864-025-11735-2.

Precise estimation of in-depth relatedness in biobank-scale datasets using deepKin.

Cell Rep Methods. 2025 Jun 16;5(6):101053. doi: 10.1016/j.crmeth.2025.101053. Epub 2025 May 27.

Root restriction accelerates genomic target identification in quinoa under controlled conditions.

Physiol Plant. 2025 Mar-Apr;177(2):e70223. doi: 10.1111/ppl.70223.

fastGxE: Powering genome-wide detection of genotype-environment interactions in biobank studies.

Res Sq. 2025 Mar 20:rs.3.rs-5952773. doi: 10.21203/rs.3.rs-5952773/v1.

Marginal interaction test for detecting interactions between genetic marker sets and environment in genome-wide studies.

G3 (Bethesda). 2025 Jan 8;15(1). doi: 10.1093/g3journal/jkae263.

Controlling for polygenic genetic confounding in epidemiologic association studies.

Proc Natl Acad Sci U S A. 2024 Oct 29;121(44):e2408715121. doi: 10.1073/pnas.2408715121. Epub 2024 Oct 21.

Genome wide association study reveals new genes for resistance to striped stem borer in rice ( L.).

Front Plant Sci. 2024 Sep 13;15:1466857. doi: 10.3389/fpls.2024.1466857. eCollection 2024.

本文引用的文献

Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits.

PLoS Genet. 2017 Jul 26;13(7):e1006869. doi: 10.1371/journal.pgen.1006869. eCollection 2017 Jul.

Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis.

Nat Genet. 2015 Dec;47(12):1385-92. doi: 10.1038/ng.3431. Epub 2015 Nov 2.

Partitioning heritability by functional annotation using genome-wide association summary statistics.

Nat Genet. 2015 Nov;47(11):1228-35. doi: 10.1038/ng.3404. Epub 2015 Sep 28.

An atlas of genetic correlations across human diseases and traits.

Nat Genet. 2015 Nov;47(11):1236-41. doi: 10.1038/ng.3406. Epub 2015 Sep 28.

Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index.

Nat Genet. 2015 Oct;47(10):1114-20. doi: 10.1038/ng.3390. Epub 2015 Aug 31.

Genomic heritability: what is it?

PLoS Genet. 2015 May 5;11(5):e1005048. doi: 10.1371/journal.pgen.1005048. eCollection 2015 May.

Efficient Bayesian mixed-model analysis increases association power in large cohorts.

Nat Genet. 2015 Mar;47(3):284-90. doi: 10.1038/ng.3190. Epub 2015 Feb 2.

LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.

Nat Genet. 2015 Mar;47(3):291-5. doi: 10.1038/ng.3211. Epub 2015 Feb 2.

Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases.

Am J Hum Genet. 2014 Nov 6;95(5):535-52. doi: 10.1016/j.ajhg.2014.10.004.

Measuring missing heritability: inferring the contribution of common variants.

Proc Natl Acad Sci U S A. 2014 Dec 9;111(49):E5272-81. doi: 10.1073/pnas.1419064111. Epub 2014 Nov 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

全基因组关联研究中基于汇总统计量的方差分量估计统一框架

A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

全基因组关联研究中基于汇总统计量的方差分量估计统一框架

A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献