利用 GWAS 汇总数据对多种表型进行强大且高效的 SNP 集关联测试。

Powerful and efficient SNP-set association tests across multiple phenotypes using GWAS summary data.

机构信息

Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA.

出版信息

Bioinformatics. 2019 Apr 15;35(8):1366-1372. doi: 10.1093/bioinformatics/bty811.

DOI:10.1093/bioinformatics/bty811

PMID:30239606

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6477978/

Abstract

MOTIVATION

Many GWAS conducted in the past decade have identified tens of thousands of disease related variants, which in total explained only part of the heritability for most traits. There remain many more genetics variants with small effect sizes to be discovered. This has motivated the development of sequencing studies with larger sample sizes and increased resolution of genotyped variants, e.g., the ongoing NHLBI Trans-Omics for Precision Medicine (TOPMed) whole genome sequencing project. An alternative approach is the development of novel and more powerful statistical methods. The current dominating approach in the field of GWAS analysis is the "single trait single variant" association test, despite the fact that most GWAS are conducted in deeply-phenotyped cohorts with many correlated traits measured. In this paper, we aim to develop rigorous methods that integrate multiple correlated traits and multiple variants to improve the power to detect novel variants. In recognition of the difficulty of accessing raw genotype and phenotype data due to privacy and logistic concerns, we develop methods that are applicable to publicly available GWAS summary data.

RESULTS

We build rigorous statistical models for GWAS summary statistics to motivate novel multi-trait SNP-set association tests, including variance component test, burden test and their adaptive test, and develop efficient numerical algorithms to quickly compute their analytical P-values. We implement the proposed methods in an open source R package. We conduct thorough simulation studies to verify the proposed methods rigorously control type I errors at the genome-wide significance level, and further demonstrate their utility via comprehensive analysis of GWAS summary data for multiple lipids traits and glycemic traits. We identified many novel loci that were not detected by the individual trait based GWAS analysis.

AVAILABILITY AND IMPLEMENTATION

We have implemented the proposed methods in an R package freely available at http://www.github.com/baolinwu/MSKAT.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

过去十年进行的许多 GWAS 已经确定了数万种与疾病相关的变异，这些变异总共仅解释了大多数性状遗传力的一部分。还有更多遗传变异具有较小的效应大小有待发现。这促使人们开展了具有更大样本量和更高分辨率的基因分型变异的测序研究，例如，正在进行的 NHLBI 转化医学精准医学（TOPMed）全基因组测序项目。另一种方法是开发新的、更强大的统计方法。目前 GWAS 分析领域的主流方法是“单一性状单一变异”关联测试，尽管大多数 GWAS 都是在具有许多相关性状的深度表型队列中进行的。在本文中，我们旨在开发严格的方法，整合多个相关性状和多个变异，以提高检测新变异的能力。由于隐私和物流问题，我们认识到获取原始基因型和表型数据的困难，因此开发了适用于公开可用的 GWAS 汇总数据的方法。

结果

我们为 GWAS 汇总统计数据构建了严格的统计模型，以激发新的多性状 SNP 集关联测试，包括方差分量测试、负担测试及其自适应测试，并开发了高效的数值算法来快速计算其分析 P 值。我们在一个开源 R 包中实现了所提出的方法。我们进行了彻底的模拟研究，以严格控制基因组范围内显著水平的Ⅰ型错误，并通过对多个脂质性状和血糖性状的 GWAS 汇总数据进行综合分析进一步证明了其效用。我们确定了许多以前未被基于单个性状的 GWAS 分析检测到的新位点。

可用性和实现

我们已在一个免费的 R 包中实现了所提出的方法，可在 http://www.github.com/baolinwu/MSKAT 上获得。

补充信息

补充数据可在生物信息学在线获得。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用 GWAS 汇总数据对多种表型进行强大且高效的 SNP 集关联测试。

Powerful and efficient SNP-set association tests across multiple phenotypes using GWAS summary data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

利用 GWAS 汇总数据对多种表型进行强大且高效的 SNP 集关联测试。

Powerful and efficient SNP-set association tests across multiple phenotypes using GWAS summary data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献