Department of Psychiatry, McLean Hospital, Harvard Medical School, Belmont, Massachusetts, USA.
Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Am J Med Genet B Neuropsychiatr Genet. 2021 Jan;186(1):16-27. doi: 10.1002/ajmg.b.32834. Epub 2021 Feb 11.
Genotype imputation across populations of mixed ancestry is critical for optimal discovery in large-scale genome-wide association studies (GWAS). Methods for direct imputation of GWAS summary-statistics were previously shown to be practically as accurate as summary statistics produced after raw genotype imputation, while incurring orders of magnitude lower computational burden. Given that direct imputation needs a precise estimation of linkage-disequilibrium (LD) and that most of the methods using a small reference panel for example, ~2,500-subject coming from the 1000 Genome-Project, there is a great need for much larger and more diverse reference panels. To accurately estimate the LD needed for an exhaustive analysis of any cosmopolitan cohort, we developed DISTMIX2. DISTMIX2: (a) uses a much larger and more diverse reference panel compared to traditional reference panels, and (b) can estimate weights of ethnic-mixture based solely on Z-scores, when allele frequencies are not available. We applied DISTMIX2 to GWAS summary-statistics from the psychiatric genetic consortium (PGC). DISTMIX2 uncovered signals in numerous new regions, with most of these findings coming from the rarer variants. Rarer variants provide much sharper location for the signals compared with common variants, as the LD for rare variants extends over a lower distance than for common ones. For example, while the original PGC post-traumatic stress disorder GWAS found only 3 marginal signals for common variants, we now uncover a very strong signal for a rare variant in PKN2, a gene associated with neuronal and hippocampal development. Thus, DISTMIX2 provides a robust and fast (re)imputation approach for most psychiatric GWAS-studies.
混合血统人群的基因型推断对于大规模全基因组关联研究(GWAS)中的最佳发现至关重要。以前已经证明,直接对 GWAS 汇总统计数据进行推断的方法在实际应用中与经过原始基因型推断后产生的汇总统计数据一样准确,而计算负担却要低几个数量级。鉴于直接推断需要对连锁不平衡(LD)进行精确估计,而大多数方法仅使用小的参考面板(例如,来自 1000 基因组计划的约 2500 个样本),因此非常需要更大和更多样化的参考面板。为了准确估计任何世界性队列的详尽分析所需的 LD,我们开发了 DISTMIX2。DISTMIX2:(a)与传统参考面板相比,使用了更大和更多样化的参考面板,(b)当等位基因频率不可用时,仅基于 Z 分数就可以估计基于种族混合的权重。我们将 DISTMIX2 应用于精神疾病遗传联盟(PGC)的 GWAS 汇总统计数据。DISTMIX2 在许多新区域发现了信号,其中大多数发现来自更为罕见的变体。与常见变体相比,罕见变体提供了更精确的信号位置,因为罕见变体的 LD 延伸距离比常见变体短。例如,尽管原始 PGC 创伤后应激障碍 GWAS 仅发现了常见变体的 3 个边缘信号,但我们现在在 PKN2 中发现了一个罕见变体的非常强信号,PKN2 是与神经元和海马体发育相关的基因。因此,DISTMIX2 为大多数精神疾病 GWAS 研究提供了一种强大而快速的(重新)推断方法。