DISSCO：允许协变量的汇总统计量直接插补

DISSCO: direct imputation of summary statistics allowing covariates.

作者信息

Xu Zheng, Duan Qing, Yan Song, Chen Wei, Li Mingyao, Lange Ethan, Li Yun

机构信息

Department of Biostatistics, Department of Genetics, Department of Computer Science.

Department of Genetics, Curriculum in Bioinformatics and Computational Biology, Department of Statistics, University of North Carolina, Chapel Hill, NC 27599, USA.

出版信息

Bioinformatics. 2015 Aug 1;31(15):2434-42. doi: 10.1093/bioinformatics/btv168. Epub 2015 Mar 24.

DOI:10.1093/bioinformatics/btv168

PMID:25810429

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4514926/

Abstract

BACKGROUND

Imputation of individual level genotypes at untyped markers using an external reference panel of genotyped or sequenced individuals has become standard practice in genetic association studies. Direct imputation of summary statistics can also be valuable, for example in meta-analyses where individual level genotype data are not available. Two methods (DIST and ImpG-Summary/LD), that assume a multivariate Gaussian distribution for the association summary statistics, have been proposed for imputing association summary statistics. However, both methods assume that the correlations between association summary statistics are the same as the correlations between the corresponding genotypes. This assumption can be violated in the presence of confounding covariates.

METHODS

We analytically show that in the absence of covariates, correlation among association summary statistics is indeed the same as that among the corresponding genotypes, thus serving as a theoretical justification for the recently proposed methods. We continue to prove that in the presence of covariates, correlation among association summary statistics becomes the partial correlation of the corresponding genotypes controlling for covariates. We therefore develop direct imputation of summary statistics allowing covariates (DISSCO).

RESULTS

We consider two real-life scenarios where the correlation and partial correlation likely make practical difference: (i) association studies in admixed populations; (ii) association studies in presence of other confounding covariate(s). Application of DISSCO to real datasets under both scenarios shows at least comparable, if not better, performance compared with existing correlation-based methods, particularly for lower frequency variants. For example, DISSCO can reduce the absolute deviation from the truth by 3.9-15.2% for variants with minor allele frequency <5%.

摘要

背景

利用已分型或测序个体的外部参考面板对未分型标记处的个体水平基因型进行插补，已成为基因关联研究的标准做法。直接插补汇总统计量也可能很有价值，例如在无法获得个体水平基因型数据的荟萃分析中。已经提出了两种假设关联汇总统计量服从多元高斯分布的方法（DIST和ImpG-Summary/LD）来插补关联汇总统计量。然而，这两种方法都假设关联汇总统计量之间的相关性与相应基因型之间的相关性相同。在存在混杂协变量的情况下，这一假设可能不成立。

方法

我们通过分析表明，在没有协变量的情况下，关联汇总统计量之间的相关性确实与相应基因型之间的相关性相同，从而为最近提出的方法提供了理论依据。我们继续证明，在存在协变量的情况下，关联汇总统计量之间的相关性变为控制协变量的相应基因型的偏相关性。因此，我们开发了允许协变量的汇总统计量直接插补方法（DISSCO）。

结果

我们考虑了两种现实情况，其中相关性和偏相关性可能会产生实际差异：（i）混合人群中的关联研究；（ii）存在其他混杂协变量的关联研究。在这两种情况下，将DISSCO应用于实际数据集显示，与现有的基于相关性的方法相比，其性能至少相当，甚至更好，特别是对于低频变异。例如，对于次要等位基因频率<5%的变异，DISSCO可以将与真实值的绝对偏差降低3.9-15.2%。

相似文献

DISSCO: direct imputation of summary statistics allowing covariates.DISSCO：允许协变量的汇总统计量直接插补

Bioinformatics. 2015 Aug 1;31(15):2434-42. doi: 10.1093/bioinformatics/btv168. Epub 2015 Mar 24.

DIST: direct imputation of summary statistics for unmeasured SNPs.直接对未测量的 SNP 进行汇总统计的推断。

Bioinformatics. 2013 Nov 15;29(22):2925-7. doi: 10.1093/bioinformatics/btt500. Epub 2013 Aug 28.

Evaluation and application of summary statistic imputation to discover new height-associated loci.评估和应用汇总统计推断发现新的身高相关位点。

PLoS Genet. 2018 May 21;14(5):e1007371. doi: 10.1371/journal.pgen.1007371. eCollection 2018 May.

DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts.DISTMIX：从混合种族队列中直接推算未测量单核苷酸多态性的汇总统计量。

Bioinformatics. 2015 Oct 1;31(19):3099-104. doi: 10.1093/bioinformatics/btv348. Epub 2015 Jun 9.

FAPI: Fast and accurate P-value Imputation for genome-wide association study.FAPI：用于全基因组关联研究的快速准确P值估算

Eur J Hum Genet. 2016 May;24(5):761-6. doi: 10.1038/ejhg.2015.190. Epub 2015 Aug 26.

Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses.Adapt-Mix：学习局部遗传相关结构可改善基于汇总统计量的分析。

Bioinformatics. 2015 Jun 15;31(12):i181-9. doi: 10.1093/bioinformatics/btv230.

Association studies with imputed variants using expectation-maximization likelihood-ratio tests.使用期望最大化似然比检验对推算变异进行关联研究。

PLoS One. 2014 Nov 10;9(11):e110679. doi: 10.1371/journal.pone.0110679. eCollection 2014.

Imputation of genotypes from low density (50,000 markers) to high density (700,000 markers) of cows from research herds in Europe, North America, and Australasia using 2 reference populations.利用2个参考群体，对欧洲、北美和澳大拉西亚研究牛群中低密度（50,000个标记）到高密度（700,000个标记）的奶牛基因型进行推算。

J Dairy Sci. 2014 Mar;97(3):1799-811. doi: 10.3168/jds.2013-7368. Epub 2014 Jan 25.

Accurate and adaptive imputation of summary statistics in mixed-ethnicity cohorts.混合族群队列中汇总统计数据的精确和自适应插补。

Bioinformatics. 2018 Sep 1;34(17):i687-i696. doi: 10.1093/bioinformatics/bty596.

Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken.鸡中三种变异检测工具的比较以及从SNP芯片数据到全基因组序列水平的填充准确性评估。

BMC Genomics. 2015 Oct 21;16:824. doi: 10.1186/s12864-015-2059-2.

引用本文的文献

The goldmine of GWAS summary statistics: a systematic review of methods and tools.全基因组关联研究汇总统计数据的宝库：方法与工具的系统综述

BioData Min. 2024 Sep 5;17(1):31. doi: 10.1186/s13040-024-00385-x.

A powerful subset-based method identifies gene set associations and improves interpretation in UK Biobank.一种强大的基于子集的方法可识别基因集关联，并改善 UK Biobank 中的解释。

Am J Hum Genet. 2021 Apr 1;108(4):669-681. doi: 10.1016/j.ajhg.2021.02.016. Epub 2021 Mar 16.

Understanding HLA associations from SNP summary association statistics.从 SNP 汇总关联统计中了解 HLA 关联。

Sci Rep. 2019 Feb 4;9(1):1337. doi: 10.1038/s41598-018-37840-9.

Accurate and adaptive imputation of summary statistics in mixed-ethnicity cohorts.混合族群队列中汇总统计数据的精确和自适应插补。

Bioinformatics. 2018 Sep 1;34(17):i687-i696. doi: 10.1093/bioinformatics/bty596.

Analysis of genetic and nongenetic factors influencing triglycerides-lowering drug effects based on paired observations.基于配对观察分析影响降甘油三酯药物疗效的遗传和非遗传因素。

BMC Proc. 2018 Sep 17;12(Suppl 9):46. doi: 10.1186/s12919-018-0153-6. eCollection 2018.

Comparison of novel and existing methods for detecting differentially methylated regions.检测差异甲基化区域的新方法与现有方法的比较。

BMC Genet. 2018 Sep 17;19(Suppl 1):84. doi: 10.1186/s12863-018-0637-4.

Proper joint analysis of summary association statistics requires the adjustment of heterogeneity in SNP coverage pattern.恰当的汇总关联统计分析需要调整 SNP 覆盖模式的异质性。

Brief Bioinform. 2018 Nov 27;19(6):1337-1343. doi: 10.1093/bib/bbx072.

PolyGEE: a generalized estimating equation approach to the efficient and robust estimation of polygenic effects in large-scale association studies.PolyGEE：一种在大规模关联研究中对多基因效应进行高效稳健估计的广义估计方程方法。

Biostatistics. 2018 Jul 1;19(3):295-306. doi: 10.1093/biostatistics/kxx040.

Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies.利用全基因组关联研究的汇总统计信息对性状相关基因组区域进行精细定位的前景

Am J Hum Genet. 2017 Oct 5;101(4):539-551. doi: 10.1016/j.ajhg.2017.08.012. Epub 2017 Sep 21.

Dissecting the genetics of complex traits using summary association statistics.利用汇总关联统计剖析复杂性状的遗传学。

Nat Rev Genet. 2017 Feb;18(2):117-127. doi: 10.1038/nrg.2016.142. Epub 2016 Nov 14.

本文引用的文献

Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.快速准确地推断汇总统计数据可增强功能富集的证据。

Bioinformatics. 2014 Oct 15;30(20):2906-14. doi: 10.1093/bioinformatics/btu416. Epub 2014 Jul 1.

DIST: direct imputation of summary statistics for unmeasured SNPs.直接对未测量的 SNP 进行汇总统计的推断。

Bioinformatics. 2013 Nov 15;29(22):2925-7. doi: 10.1093/bioinformatics/btt500. Epub 2013 Aug 28.

Imputation of coding variants in African Americans: better performance using data from the exome sequencing project.对非裔美国人编码变异的推断：使用外显子测序项目数据可获得更好的性能。

Bioinformatics. 2013 Nov 1;29(21):2744-9. doi: 10.1093/bioinformatics/btt477. Epub 2013 Aug 16.

Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture.全基因组荟萃分析确定了 11 个人体测量性状的新位点，并提供了对遗传结构的深入了解。

Nat Genet. 2013 May;45(5):501-12. doi: 10.1038/ng.2606. Epub 2013 Apr 7.

An integrated map of genetic variation from 1,092 human genomes.1092 个人类基因组遗传变异的综合图谱。

Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.

Imputation of exome sequence variants into population- based samples and blood-cell-trait-associated loci in African Americans: NHLBI GO Exome Sequencing Project.外显子组序列变异在基于人群的样本和非裔美国人血液细胞特征相关基因座中的推断：NHLBI GO 外显子组测序计划。

Am J Hum Genet. 2012 Nov 2;91(5):794-808. doi: 10.1016/j.ajhg.2012.08.031. Epub 2012 Oct 25.

MaCH-admix: genotype imputation for admixed populations.MaCH-admix：混合人群的基因型推断。

Genet Epidemiol. 2013 Jan;37(1):25-37. doi: 10.1002/gepi.21690. Epub 2012 Oct 16.

Fast and accurate genotype imputation in genome-wide association studies through pre-phasing.通过预分组实现全基因组关联研究中的快速准确基因型推断。

Nat Genet. 2012 Jul 22;44(8):955-9. doi: 10.1038/ng.2354.

Extremely low-coverage sequencing and imputation increases power for genome-wide association studies.极低覆盖度测序和模拟提高了全基因组关联研究的效能。

Nat Genet. 2012 May 20;44(6):631-5. doi: 10.1038/ng.2283.

Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals.脂肪因子基因新位点及其对 2 型糖尿病和代谢特征的影响：45891 人的多民族荟萃分析。

PLoS Genet. 2012;8(3):e1002607. doi: 10.1371/journal.pgen.1002607. Epub 2012 Mar 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验