Wang Tao, Pradhan Kith, Ye Kenny, Wong Lee-Jun, Rohan Thomas E
Department of Epidemiology and Population Health, Albert Einstein College of Medicine Bronx, NY, USA.
Front Genet. 2011 Aug 17;2:51. doi: 10.3389/fgene.2011.00051. eCollection 2011.
Both common and rare mitochondrial DNA (mtDNA) variants may contribute to genetic susceptibility to some complex human diseases. Understanding of the role of mtDNA variants will provide valuable insights into the etiology of these diseases. However, to date, there have not been any large-scale, genome-wide association studies of complete mtDNA variants and disease risk. One reason for this might be the substantial cost of sequencing the large number of samples required for genetic epidemiology studies. Next-generation sequencing of pooled mtDNA samples will dramatically reduce the cost of such studies and may represent an appealing approach for large-scale genetic epidemiology studies. However, the performance of the different designs of sequencing pooled mtDNA has not been evaluated.
We examined the approach of sequencing pooled mtDNA of multiple individuals for estimating allele frequency using the Illumina genome analyzer (GA) II sequencing system. In this study the pool included mtDNA samples of 20 subjects that had been sequenced previously using Sanger sequencing. Each pool was replicated once to assess variation of the sequencing error between pools. To reduce such variation, barcoding was used for sequencing different pools in the same lane of the flow cell. To evaluate the effect of different pooling strategies pooling was done at both the pre- and post-PCR amplification step.
The sequencing error rate was close to that expected based on the Phred score. When only reads with Phred ≥ 20 were considered, the average error rate was about 0.3%. However, there was significant variation of the base-calling errors for different types of bases or at different loci. Using the results of the Sanger sequencing as the standard, the sensitivity of single nucleotide polymorphism detection with post-PCR pooling (about 99%) was higher than that of the pre-PCR pooling (about 82%), while the two approaches had similar specificity (about 99%). Among a total of 298 variants in the sample, the allele frequencies of 293 variants (98%) were correctly estimated with post-PCR pooling, the correlation between the estimated and the true allele frequencies being >0.99, while only 206 allele frequencies (69%) were correctly estimated in the pre-PCR pooling, the correlation being 0.89.
Sequencing of mtDNA pooled after PCR amplification is a viable tool for screening mitochondrial variants potentially related to human diseases.
常见和罕见的线粒体DNA(mtDNA)变异都可能导致人类对某些复杂疾病的遗传易感性。了解mtDNA变异的作用将为这些疾病的病因提供有价值的见解。然而,迄今为止,尚未有针对完整mtDNA变异与疾病风险的大规模全基因组关联研究。造成这种情况的一个原因可能是对遗传流行病学研究所需的大量样本进行测序的成本过高。对混合mtDNA样本进行二代测序将大幅降低此类研究的成本,可能是大规模遗传流行病学研究的一种有吸引力的方法。然而,不同设计的混合mtDNA测序性能尚未得到评估。
我们使用Illumina基因组分析仪(GA)II测序系统研究了对多个个体的混合mtDNA进行测序以估计等位基因频率的方法。在本研究中,混合样本包含20名受试者的mtDNA样本,这些样本之前已使用桑格测序法进行过测序。每个混合样本重复一次,以评估不同混合样本之间测序误差的差异。为了减少这种差异,在流动池的同一泳道中对不同混合样本进行测序时使用了条形码技术。为了评估不同混合策略的效果,在PCR扩增前和扩增后都进行了混合。
测序错误率接近基于Phred评分预期的值。当仅考虑Phred≥20的读数时,平均错误率约为0.3%。然而,不同类型碱基或不同位点的碱基识别错误存在显著差异。以桑格测序结果为标准,PCR扩增后混合样本检测单核苷酸多态性的灵敏度(约99%)高于PCR扩增前混合样本(约82%),而两种方法的特异性相似(约99%)。在样本中的总共298个变异中,PCR扩增后混合样本正确估计了293个变异(98%)的等位基因频率,估计的等位基因频率与真实等位基因频率之间的相关性>0.99,而在PCR扩增前混合样本中仅正确估计了206个等位基因频率(69%),相关性为0.89。
PCR扩增后对混合mtDNA进行测序是筛选可能与人类疾病相关的线粒体变异的可行工具。