Zheng Jie, Erzurumluoglu A Mesut, Elsworth Benjamin L, Kemp John P, Howe Laurence, Haycock Philip C, Hemani Gibran, Tansey Katherine, Laurin Charles, Pourcain Beate St, Warrington Nicole M, Finucane Hilary K, Price Alkes L, Bulik-Sullivan Brendan K, Anttila Verneri, Paternoster Lavinia, Gaunt Tom R, Evans David M, Neale Benjamin M
MRC Integrative Epidemiology Unit, University of Bristol, Oakfield House, Bristol, UK.
Genetic Epidemiology Group, Department of Health Sciences, University of Leicester, Leicester, UK.
Bioinformatics. 2017 Jan 15;33(2):272-279. doi: 10.1093/bioinformatics/btw613. Epub 2016 Sep 22.
LD score regression is a reliable and efficient method of using genome-wide association study (GWAS) summary-level results data to estimate the SNP heritability of complex traits and diseases, partition this heritability into functional categories, and estimate the genetic correlation between different phenotypes. Because the method relies on summary level results data, LD score regression is computationally tractable even for very large sample sizes. However, publicly available GWAS summary-level data are typically stored in different databases and have different formats, making it difficult to apply LD score regression to estimate genetic correlations across many different traits simultaneously.
In this manuscript, we describe LD Hub - a centralized database of summary-level GWAS results for 173 diseases/traits from different publicly available resources/consortia and a web interface that automates the LD score regression analysis pipeline. To demonstrate functionality and validate our software, we replicated previously reported LD score regression analyses of 49 traits/diseases using LD Hub; and estimated SNP heritability and the genetic correlation across the different phenotypes. We also present new results obtained by uploading a recent atopic dermatitis GWAS meta-analysis to examine the genetic correlation between the condition and other potentially related traits. In response to the growing availability of publicly accessible GWAS summary-level results data, our database and the accompanying web interface will ensure maximal uptake of the LD score regression methodology, provide a useful database for the public dissemination of GWAS results, and provide a method for easily screening hundreds of traits for overlapping genetic aetiologies.
The web interface and instructions for using LD Hub are available at http://ldsc.broadinstitute.org/ CONTACT: jie.zheng@bristol.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.
连锁不平衡(LD)评分回归是一种可靠且高效的方法,用于利用全基因组关联研究(GWAS)汇总水平的结果数据来估计复杂性状和疾病的单核苷酸多态性(SNP)遗传力,将这种遗传力划分为功能类别,并估计不同表型之间的遗传相关性。由于该方法依赖于汇总水平的结果数据,即使对于非常大的样本量,LD评分回归在计算上也是易于处理的。然而,公开可用的GWAS汇总水平数据通常存储在不同的数据库中,并且具有不同的格式,这使得难以应用LD评分回归来同时估计许多不同性状之间的遗传相关性。
在本论文中,我们描述了LD Hub——一个集中式数据库,包含来自不同公开可用资源/联盟的173种疾病/性状的汇总水平GWAS结果,以及一个自动化LD评分回归分析流程的网页界面。为了展示其功能并验证我们的软件,我们使用LD Hub重复了先前报道的对49个性状/疾病的LD评分回归分析;并估计了不同表型之间的SNP遗传力和遗传相关性。我们还展示了通过上传最近的特应性皮炎GWAS荟萃分析获得的新结果,以检验该疾病与其他潜在相关性状之间的遗传相关性。鉴于公开可获取的GWAS汇总水平结果数据越来越多,我们的数据库及配套的网页界面将确保LD评分回归方法得到最大程度的应用,为GWAS结果的公开传播提供一个有用的数据库,并提供一种轻松筛选数百个性状以寻找重叠遗传病因的方法。
网页界面及使用LD Hub的说明可在http://ldsc.broadinstitute.org/获取。联系方式:jie.zheng@bristol.ac.uk。补充信息:补充数据可在《生物信息学》在线获取。