Zheng Jianbiao, Moorhead Martin, Weng Li, Siddiqui Farooq, Carlton Victoria E H, Ireland James S, Lee Liana, Peterson Joseph, Wilkins Jennifer, Lin Sean, Kan Zhengyan, Seshagiri Somasekar, Davis Ronald W, Faham Malek
Affymetrix Inc., 3420 Central Expressway, Santa Clara, CA 95051, USA.
Proc Natl Acad Sci U S A. 2009 Apr 21;106(16):6712-7. doi: 10.1073/pnas.0901902106. Epub 2009 Apr 2.
Although genomewide association studies have successfully identified associations of many common single-nucleotide polymorphisms (SNPs) with common diseases, the SNPs implicated so far account for only a small proportion of the genetic variability of tested diseases. It has been suggested that common diseases may often be caused by rare alleles missed by genomewide association studies. To identify these rare alleles we need high-throughput, high-accuracy resequencing technologies. Although array-based genotyping has allowed genomewide association studies of common SNPs in tens of thousands of samples, array-based resequencing has been limited for 2 main reasons: the lack of a fully multiplexed pipeline for high-throughput sample processing, and failure to achieve sufficient performance. We have recently solved both of these problems and created a fully multiplexed high-throughput pipeline that results in high-quality data. The pipeline consists of target amplification from genomic DNA, followed by allele enrichment to generate pools of purified variant (or nonvariant) DNA and ends with interrogation of purified DNA on resequencing arrays. We have used this pipeline to resequence approximately 5 Mb of DNA (on 3 arrays) corresponding to the exons of 1,500 genes in >473 samples; in total >2,350 Mb were sequenced. In the context of this large-scale study we obtained a false positive rate of approximately 1 in 500,000 bp and a false negative rate of approximately 10%.
尽管全基因组关联研究已成功识别出许多常见单核苷酸多态性(SNP)与常见疾病之间的关联,但迄今为止所涉及的SNP仅占所检测疾病遗传变异性的一小部分。有人提出,常见疾病可能常常由全基因组关联研究遗漏的罕见等位基因引起。为了识别这些罕见等位基因,我们需要高通量、高精度的重测序技术。虽然基于芯片的基因分型已使得在数万个样本中对常见SNP进行全基因组关联研究,但基于芯片的重测序受到限制主要有两个原因:缺乏用于高通量样本处理的完全多重化流程,以及未能实现足够的性能。我们最近解决了这两个问题,并创建了一个能产生高质量数据的完全多重化高通量流程。该流程包括从基因组DNA进行靶标扩增,随后进行等位基因富集以生成纯化的变异(或非变异)DNA池,并以在重测序芯片上对纯化的DNA进行检测结束。我们已使用此流程对超过473个样本中对应于1500个基因外显子的约5 Mb DNA(在3个芯片上)进行重测序;总共测序超过2350 Mb。在这项大规模研究中,我们获得的假阳性率约为每500,000 bp中有1个,假阴性率约为10%。