Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA.
Hum Mutat. 2013 Sep;34(9):1304-11. doi: 10.1002/humu.22359. Epub 2013 Jun 17.
Although simple tandem repeats (STRs) comprise ~2% of the human genome and represent an important source of polymorphism, this class of variation remains understudied. We have developed a cost-effective strategy for performing targeted enrichment of STR regions that utilizes capture probes targeting the flanking sequences of STR loci, enabling specific capture of DNA fragments containing STRs for subsequent high-throughput sequencing. Utilizing a capture design targeting 6,243 STR loci <94 bp and multiplexing eight individuals in a single Illumina HiSeq2000 sequencing lane we were able to call genotypes in at least one individual for 67.5% of the targeted STRs. We observed a strong relationship between (G+C) content and genotyping rate. STRs with moderate (G+C) content were recovered with >90% success rate, whereas only 12% of STRs with ≥ 80% (G+C) were genotyped in our assay. Analysis of a parent-offspring trio, complete hydatidiform mole samples, repeat analyses of the same individual, and Sanger sequencing-based validation indicated genotyping error rates between 7.6% and 12.4%. The majority of such errors were a single repeat unit at mono- or dinucleotide repeats. Altogether, our STR capture assay represents a cost-effective method that enables multiplexed genotyping of thousands of STR loci suitable for large-scale population studies.
尽管简单串联重复(STRs)仅占人类基因组的~2%,但却是多态性的重要来源,然而这一类变异仍未得到充分研究。我们开发了一种经济高效的靶向富集 STR 区域的策略,该策略利用靶向 STR 基因座侧翼序列的捕获探针,实现了包含 STR 的 DNA 片段的特异性捕获,从而进行后续的高通量测序。利用靶向 6,243 个长度小于 94 bp 的 STR 基因座的捕获设计,以及在单个 Illumina HiSeq2000 测序通道中对 8 个个体进行多重分析,我们能够对至少一个个体的 67.5%的靶向 STR 进行基因分型。我们观察到(G+C)含量与基因分型率之间存在很强的关系。(G+C)含量适中的 STR 以超过 90%的成功率回收,而我们的检测中只有 12%的(G+C)含量≥80%的 STR 进行了基因分型。对一个亲子三代、完全葡萄胎样本、同一个体的重复分析以及基于 Sanger 测序的验证的分析表明,基因分型错误率在 7.6%到 12.4%之间。这些错误大多是单核苷酸或二核苷酸重复的单个重复单元。总之,我们的 STR 捕获分析代表了一种经济高效的方法,可实现数千个 STR 基因座的多重基因分型,适合大规模的人群研究。