MRC Integrative Epidemiology Unit, University of Bristol, Oakfield House, Bristol, BS8 2BN, UK.
Intelligent Systems Laboratory, University of Bristol, Tyndall Ave, Bristol, BS8 1TH, UK.
Gigascience. 2018 Aug 1;7(8):giy090. doi: 10.1093/gigascience/giy090.
Identifying phenotypic correlations between complex traits and diseases can provide useful etiological insights. Restricted access to much individual-level phenotype data makes it difficult to estimate large-scale phenotypic correlation across the human phenome. Two state-of-the-art methods, metaCCA and LD score regression, provide an alternative approach to estimate phenotypic correlation using only genome-wide association study (GWAS) summary results.
Here, we present an integrated R toolkit, PhenoSpD, to use LD score regression to estimate phenotypic correlations using GWAS summary statistics and to utilize the estimated phenotypic correlations to inform correction of multiple testing for complex human traits using the spectral decomposition of matrices (SpD). The simulations suggest that it is possible to identify nonindependence of phenotypes using samples with partial overlap; as overlap decreases, the estimated phenotypic correlations will attenuate toward zero and multiple testing correction will be more stringent than in perfectly overlapping samples. Also, in contrast to LD score regression, metaCCA will provide approximate genetic correlations rather than phenotypic correlation, which limits its application for multiple testing correction. In a case study, PhenoSpD using UK Biobank GWAS results suggested 399.6 independent tests among 487 human traits, which is close to the 352.4 independent tests estimated using true phenotypic correlation. We further applied PhenoSpD to an estimated 5,618 pair-wise phenotypic correlations among 107 metabolites using GWAS summary statistics from Kettunen's publication and PhenoSpD suggested the equivalent of 33.5 independent tests for these metabolites.
PhenoSpD extends the use of summary-level results, providing a simple and conservative way to reduce dimensionality for complex human traits using GWAS summary statistics. This is particularly valuable in the age of large-scale biobank and consortia studies, where GWAS results are much more accessible than individual-level data.
识别复杂性状和疾病之间的表型相关性可以提供有用的病因学见解。由于个体水平表型数据的获取受到限制,因此很难估计人类表型范围内的大规模表型相关性。两种最先进的方法,metaCCA 和 LD 分数回归,提供了一种替代方法,仅使用全基因组关联研究 (GWAS) 汇总结果来估计表型相关性。
在这里,我们提出了一个集成的 R 工具包 PhenoSpD,用于使用 LD 分数回归来估计使用 GWAS 汇总统计数据的表型相关性,并利用估计的表型相关性来利用矩阵的谱分解 (SpD) 信息来校正复杂人类性状的多重检验。模拟表明,使用部分重叠的样本可以识别表型的非独立性;随着重叠程度的降低,估计的表型相关性将趋于零,多重检验校正将比在完全重叠的样本中更为严格。此外,与 LD 分数回归不同,metaCCA 将提供近似的遗传相关性,而不是表型相关性,这限制了其在多重检验校正中的应用。在一个案例研究中,使用英国生物库 GWAS 结果的 PhenoSpD 表明在 487 个人类特征中有 399.6 个独立的测试,这与使用真实表型相关性估计的 352.4 个独立测试非常接近。我们进一步应用 PhenoSpD 对 Kettunen 发表的 GWAS 汇总统计数据中 107 种代谢物之间的 5618 对两两表型相关性进行了分析,PhenoSpD 表明这些代谢物的等效独立测试为 33.5 个。
PhenoSpD 扩展了汇总结果的使用,为使用 GWAS 汇总统计数据对复杂人类性状进行降维提供了一种简单而保守的方法。这在大型生物库和联盟研究时代尤其有价值,在这个时代,GWAS 结果比个体水平的数据更容易获得。