Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA.
Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, USA.
Nat Commun. 2021 Apr 6;12(1):2075. doi: 10.1038/s41467-021-22206-z.
Variable number tandem repeats (VNTRs) account for significant genetic variation in many organisms. In humans, VNTRs have been implicated in both Mendelian and complex disorders, but are largely ignored by genomic pipelines due to the complexity of genotyping and the computational expense. We describe adVNTR-NN, a method that uses shallow neural networks to genotype a VNTR in 18 seconds on 55X whole genome data, while maintaining high accuracy. We use adVNTR-NN to genotype 10,264 VNTRs in 652 GTEx individuals. Associating VNTR length with gene expression in 46 tissues, we identify 163 "eVNTRs". Of the 22 eVNTRs in blood where independent data is available, 21 (95%) are replicated in terms of significance and direction of association. 49% of the eVNTR loci show a strong and likely causal impact on the expression of genes and 80% have maximum effect size at least 0.3. The impacted genes are involved in diseases including Alzheimer's, obesity and familial cancers, highlighting the importance of VNTRs for understanding the genetic basis of complex diseases.
可变数目串联重复(VNTRs)在许多生物体中占重要的遗传变异。在人类中,VNTRs 与孟德尔和复杂疾病都有关联,但由于基因分型的复杂性和计算费用,它们在基因组学管道中基本上被忽略了。我们描述了 adVNTR-NN 方法,该方法使用浅层神经网络在 55X 全基因组数据上 18 秒内对 VNTR 进行基因分型,同时保持高精度。我们使用 adVNTR-NN 对 652 个 GTEx 个体中的 10264 个 VNTR 进行基因分型。将 VNTR 长度与 46 种组织中的基因表达相关联,我们鉴定出 163 个“eVNTRs”。在血液中具有独立数据的 22 个 eVNTR 中,有 21 个(95%)在关联的显著性和方向上得到了复制。49%的 eVNTR 位点对基因表达具有强烈且可能的因果影响,80%的最大效应大小至少为 0.3。受影响的基因涉及包括阿尔茨海默病、肥胖症和家族性癌症在内的疾病,突出了 VNTRs 对于理解复杂疾病遗传基础的重要性。