Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA.
Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, 1470 Madison Avenue, Room 8-116, Box 1498, New York, NY 10029, USA.
Am J Hum Genet. 2022 Jun 2;109(6):1065-1076. doi: 10.1016/j.ajhg.2022.04.016. Epub 2022 May 23.
The human genome contains tens of thousands of large tandem repeats and hundreds of genes that show common and highly variable copy-number changes. Due to their large size and repetitive nature, these variable number tandem repeats (VNTRs) and multicopy genes are generally recalcitrant to standard genotyping approaches and, as a result, this class of variation is poorly characterized. However, several recent studies have demonstrated that copy-number variation of VNTRs can modify local gene expression, epigenetics, and human traits, indicating that many have a functional role. Here, using read depth from whole-genome sequencing to profile copy number, we report results of a phenome-wide association study (PheWAS) of VNTRs and multicopy genes in a discovery cohort of ∼35,000 samples, identifying 32 traits associated with copy number of 38 VNTRs and multicopy genes at 1% FDR. We replicated many of these signals in an independent cohort and observed that VNTRs showing trait associations were significantly enriched for expression QTLs with nearby genes, providing strong support for our results. Fine-mapping studies indicated that in the majority (∼90%) of cases, the VNTRs and multicopy genes we identified represent the causal variants underlying the observed associations. Furthermore, several lie in regions where prior SNV-based GWASs have failed to identify any significant associations with these traits. Our study indicates that copy number of VNTRs and multicopy genes contributes to diverse human traits and suggests that complex structural variants potentially explain some of the so-called "missing heritability" of SNV-based GWASs.
人类基因组包含数万个人类串联重复序列和数百个表现出常见和高度可变拷贝数变化的基因。由于其较大的大小和重复性质,这些可变数量串联重复序列 (VNTRs) 和多拷贝基因通常难以采用标准基因分型方法进行分析,因此,这一类变异特征描述较差。然而,最近的几项研究表明,VNTRs 的拷贝数变异可以改变局部基因表达、表观遗传学和人类特征,表明许多具有功能作用。在这里,我们使用全基因组测序的读取深度来分析拷贝数,报告了在一个约 35,000 个样本的发现队列中进行 VNTRs 和多拷贝基因的全基因组关联研究 (PheWAS) 的结果,鉴定出 32 个与 38 个 VNTRs 和多拷贝基因的拷贝数相关的特征,在 FDR 为 1%的情况下达到显著水平。我们在一个独立的队列中复制了其中的许多信号,并观察到与表型相关的 VNTRs 与附近基因的表达 QTL 显著富集,为我们的结果提供了强有力的支持。精细映射研究表明,在大多数情况下(约 90%),我们确定的 VNTRs 和多拷贝基因代表了观察到的关联背后的因果变异。此外,其中一些位于先前基于 SNP 的 GWAS 未能识别出与这些特征相关的任何显著关联的区域。我们的研究表明,VNTRs 和多拷贝基因的拷贝数有助于多种人类特征,并表明复杂的结构变异可能解释了一些基于 SNP 的 GWASs 所谓的“缺失遗传力”。