Willems Thomas, Gymrek Melissa, Highnam Gareth, Mittelman David, Erlich Yaniv
Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Computational and Systems Biology Program, MIT, Cambridge, Massachusetts 02139, USA;
Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA; Department of Molecular Biology and Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA;
Genome Res. 2014 Nov;24(11):1894-904. doi: 10.1101/gr.177774.114. Epub 2014 Aug 18.
Short tandem repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across more than 1000 individuals in Phase 1 of the 1000 Genomes Project. Extensive quality controls show that reliable allelic spectra can be obtained for close to 90% of the STR loci in the genome. We utilize this call set to analyze determinants of STR variation, assess the human reference genome's representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations.
短串联重复序列是人类基因组中多态性最高的位点之一。这些位点在一系列遗传疾病的病因学中发挥作用,并经常被用于法医学、群体遗传学和遗传谱系研究。尽管有如此众多的应用,但对于人类群体中大多数短串联重复序列的变异情况却知之甚少。在此,我们报告了迄今为止对人类短串联重复序列变异的最大规模分析。我们在千人基因组计划第一阶段收集了超过1000个个体中近70万个短串联重复序列位点的信息。广泛的质量控制表明,对于基因组中近90%的短串联重复序列位点,可以获得可靠的等位基因谱。我们利用这个调用集来分析短串联重复序列变异的决定因素,评估人类参考基因组中短串联重复序列等位基因的代表性,寻找具有常见功能丧失等位基因的短串联重复序列位点,并获得短串联重复序列与常见单核苷酸多态性之间连锁不平衡的初步估计。总体而言,这些分析进一步阐明了超越经典点突变的遗传变异规模。