Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA.
Gene. 2013 Mar 10;516(2):328-34. doi: 10.1016/j.gene.2012.12.068. Epub 2012 Dec 26.
Using our microsatellite specific genotyping method, we analyzed tandem repeats, which are known to be highly variable with some recognized as biomarkers causative of disease, in over 500 individuals who were exon sequenced in a 1000 Genomes Project pilot study. We were able to genotype over 97% of the microsatellite loci in the targeted regions. A total of 25,115 variations were observed, including repeat length and single nucleotide polymorphisms, corresponding to an average of 45.6 variations per individual and a density of 1.1 variations per kilobase. Standard variant detection did not report 94.2% of the exonic repeat length variations in part because the alignment techniques are not ideal for repetitive regions. Additionally some standard variation detection tools rely on a database of known variations, making them less likely to call repeat length variations as only a small percent of these loci (~6000) have been accurately characterized. A subset of the hundreds of non-synonymous variations we identified was experimentally validated, indicating an accuracy of 96.5% for our microsatellite-based genotyping method, with some novel variants identified in genes associated with cancer. We propose that microsatellite-based genotyping be used as a part of large scale sequencing studies to identify novel variants.
使用我们的微卫星特异性基因分型方法,我们分析了串联重复序列,这些序列已知高度可变,其中一些被认为是导致疾病的生物标志物,在超过 500 名在 1000 基因组计划试点研究中外显子测序的个体中进行分析。我们能够对靶向区域中的 97%以上的微卫星基因座进行基因分型。共观察到 25,115 种变异,包括重复长度和单核苷酸多态性,平均每个个体 45.6 种变异,每千碱基密度为 1.1 种变异。标准变异检测并未报告 94.2%的外显子重复长度变异,部分原因是由于对齐技术对于重复区域并不理想。此外,一些标准变异检测工具依赖于已知变异的数据库,因此不太可能将重复长度变异作为仅一小部分这些基因座(~6000)已经得到准确描述。我们鉴定的数百个非同义变异中的一部分经过实验验证,表明我们基于微卫星的基因分型方法的准确性为 96.5%,在与癌症相关的基因中鉴定出一些新的变异。我们建议在大规模测序研究中使用基于微卫星的基因分型来鉴定新的变异。