Suppr超能文献

IGESS:一种在全基因组关联研究中整合个体水平基因型数据和汇总统计数据的统计方法。

IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

机构信息

School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China.

Department of Mathematics, Hong Kong Baptist University, Hong Kong.

出版信息

Bioinformatics. 2017 Sep 15;33(18):2882-2889. doi: 10.1093/bioinformatics/btx314.

Abstract

MOTIVATION

Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question.

RESULTS

In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants.

AVAILABILITY AND IMPLEMENTATION

The IGESS software is available at https://github.com/daviddaigithub/IGESS .

CONTACT

zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

全基因组关联研究(GWAS)的结果表明,复杂的表型通常受到许多具有小效应的变体的影响,这些变体被称为“多效性”。为了确保识别这些具有小效应的变体的统计能力,通常需要成千上万的样本。然而,研究小组通常只能获得访问个体水平基因型数据的批准,而样本量有限(例如几百或几千个)。同时,基于单变量分析生成的汇总统计数据正变得越来越公开。与汇总统计数据集相关的样本量通常相当大。如何最有效地利用现有的丰富数据资源在很大程度上仍然是一个悬而未决的问题。

结果

在这项研究中,我们提出了一种统计方法 IGESS,通过整合个体水平的基因型数据和汇总统计数据,来提高识别风险变体的统计能力并提高风险预测的准确性。开发了一种基于变分推理的高效算法来处理全基因组分析。通过全面的模拟研究,我们证明了 IGESS 优于仅使用个体水平数据或汇总统计数据作为输入的方法的优势。我们应用 IGESS 对来自 WTCCC 的克罗恩病进行综合分析,并使用其他研究的汇总统计数据。IGESS 能够显著提高识别风险变体的统计能力,并将风险预测准确性从 63.2%(±0.4%)提高到 69.4%(±0.1%),使用了大约 240000 个变体。

可用性和实现

IGESS 软件可在 https://github.com/daviddaigithub/IGESS 获得。

联系人

zbxu@xjtu.edu.cnxwan@comp.hkbu.edu.hkeeyang@hkbu.edu.hk

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

引用本文的文献

本文引用的文献

9
The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.NHGRI GWAS Catalog,一个经过精心策划的 SNP 与特征关联资源。
Nucleic Acids Res. 2014 Jan;42(Database issue):D1001-6. doi: 10.1093/nar/gkt1229. Epub 2013 Dec 6.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验