Okou David T, Locke Adam E, Steinberg Karyn M, Hagen Katie, Athri Prashanth, Shetty Amol C, Patel Viren, Zwick Michael E
Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA.
Ann Hum Genet. 2009 Sep;73(Pt 5):502-13. doi: 10.1111/j.1469-1809.2009.00530.x. Epub 2009 Jul 1.
Novel methods of targeted sequencing of unique regions from complex eukaryotic genomes have generated a great deal of excitement, but critical demonstrations of these methods efficacy with respect to diploid genotype calling and experimental variation are lacking. To address this issue, we optimized microarray-based genomic selection (MGS) for use with the Illumina Genome Analyzer (IGA). A set of 202 fragments (304 kb total) contained within a 1.7 Mb genomic region on human chromosome X were MGS/IGA sequenced in ten female HapMap samples generating a total of 2.4 GB of DNA sequence. At a minimum coverage threshold of 5X, 93.9% of all bases and 94.9% of segregating sites were called, while 57.7% of bases (57.4% of segregating sites) were called at a 50X threshold. Data accuracy at known segregating sites was 98.9% at 5X coverage, rising to 99.6% at 50X coverage. Accuracy at homozygous sites was 98.7% at 5X sequence coverage and 99.5% at 50X coverage. Although accuracy at heterozygous sites was modestly lower, it was still over 92% at 5X coverage and increased to nearly 97% at 50X coverage. These data provide the first demonstration that MGS/IGA sequencing can generate the very high quality sequence data necessary for human genetics research. All sequences generated in this study have been deposited in NCBI Short Read Archive (http://www.ncbi.nlm.nih.gov/Traces/sra, Accession # SRA007913).
针对复杂真核生物基因组独特区域的靶向测序新方法引发了广泛关注,但这些方法在二倍体基因型判定和实验变异方面的有效性缺乏关键验证。为解决这一问题,我们对基于微阵列的基因组选择(MGS)进行了优化,使其适用于Illumina基因组分析仪(IGA)。在人类X染色体上一个1.7 Mb基因组区域内的一组202个片段(共304 kb),在10个女性HapMap样本中进行了MGS/IGA测序,共产生了2.4 GB的DNA序列。在最低覆盖阈值为5倍时,所有碱基的93.9%和分离位点的94.9%被判定,而在50倍阈值时,57.7%的碱基(57.4%的分离位点)被判定。在已知分离位点处,5倍覆盖时数据准确率为98.9%,50倍覆盖时升至99.6%。纯合位点在5倍序列覆盖时准确率为98.7%,50倍覆盖时为99.5%。尽管杂合位点的准确率略低,但在5倍覆盖时仍超过92%,在50倍覆盖时增至近97%。这些数据首次证明,MGS/IGA测序能够生成人类遗传学研究所需的高质量序列数据。本研究中生成的所有序列已存入NCBI短读存档库(http://www.ncbi.nlm.nih.gov/Traces/sra,登录号# SRA007913)。