Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA.
Department of Biology, Boston University, Boston, MA 02215, USA.
Nucleic Acids Res. 2021 May 7;49(8):4308-4324. doi: 10.1093/nar/gkab224.
Variable Number Tandem Repeats (VNTRs) are tandem repeat (TR) loci that vary in copy number across a population. Using our program, VNTRseek, we analyzed human whole genome sequencing datasets from 2770 individuals in order to detect minisatellite VNTRs, i.e., those with pattern sizes ≥7 bp. We detected 35 638 VNTR loci and classified 5676 as commonly polymorphic (i.e. with non-reference alleles occurring in >5% of the population). Commonly polymorphic VNTR loci were found to be enriched in genomic regions with regulatory function, i.e. transcription start sites and enhancers. Investigation of the commonly polymorphic VNTRs in the context of population ancestry revealed that 1096 loci contained population-specific alleles and that those could be used to classify individuals into super-populations with near-perfect accuracy. Search for quantitative trait loci (eQTLs), among the VNTRs proximal to genes, indicated that in 187 genes expression differences correlated with VNTR genotype. We validated our predictions in several ways, including experimentally, through the identification of predicted alleles in long reads, and by comparisons showing consistency between sequencing platforms. This study is the most comprehensive analysis of minisatellite VNTRs in the human population to date.
可变数目串联重复 (VNTRs) 是串联重复 (TR) 基因座,在人群中其拷贝数会发生变化。我们使用 VNTRseek 程序分析了 2770 个人的全基因组测序数据集,以检测小卫星 VNTRs,即那些具有大小≥7bp 的模式。我们检测到 35638 个 VNTR 基因座,并将 5676 个归类为常见多态性(即具有非参考等位基因的个体出现在>5%的人群中)。常见多态性 VNTR 基因座富集在具有调控功能的基因组区域,即转录起始位点和增强子。在人群起源的背景下对常见多态性 VNTRs 的研究表明,1096 个基因座包含特定于人群的等位基因,这些等位基因可用于近乎完美地将个体分类为超级群体。在靠近基因的 VNTRs 中寻找数量性状基因座 (eQTLs) 表明,在 187 个基因中,表达差异与 VNTR 基因型相关。我们通过多种方式验证了我们的预测,包括通过实验,通过在长读序列中识别预测的等位基因,以及通过比较显示测序平台之间的一致性。这项研究是迄今为止对人类群体中小卫星 VNTRs 的最全面分析。