National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.
Genome Res. 2010 Oct;20(10):1420-31. doi: 10.1101/gr.106716.110. Epub 2010 Sep 1.
Massively parallel DNA sequencing technologies have greatly increased our ability to generate large amounts of sequencing data at a rapid pace. Several methods have been developed to enrich for genomic regions of interest for targeted sequencing. We have compared three of these methods: Molecular Inversion Probes (MIP), Solution Hybrid Selection (SHS), and Microarray-based Genomic Selection (MGS). Using HapMap DNA samples, we compared each of these methods with respect to their ability to capture an identical set of exons and evolutionarily conserved regions associated with 528 genes (2.61 Mb). For sequence analysis, we developed and used a novel Bayesian genotype-assigning algorithm, Most Probable Genotype (MPG). All three capture methods were effective, but sensitivities (percentage of targeted bases associated with high-quality genotypes) varied for an equivalent amount of pass-filtered sequence: for example, 70% (MIP), 84% (SHS), and 91% (MGS) for 400 Mb. In contrast, all methods yielded similar accuracies of >99.84% when compared to Infinium 1M SNP BeadChip-derived genotypes and >99.998% when compared to 30-fold coverage whole-genome shotgun sequencing data. We also observed a low false-positive rate with all three methods; of the heterozygous positions identified by each of the capture methods, >99.57% agreed with 1M SNP BeadChip, and >98.840% agreed with the whole-genome shotgun data. In addition, we successfully piloted the genomic enrichment of a set of 12 pooled samples via the MGS method using molecular bar codes. We find that these three genomic enrichment methods are highly accurate and practical, with sensitivities comparable to that of 30-fold coverage whole-genome shotgun data.
大规模平行 DNA 测序技术极大地提高了我们快速生成大量测序数据的能力。已经开发了几种方法来富集基因组中感兴趣的目标区域进行靶向测序。我们比较了这三种方法:分子倒置探针(MIP)、溶液杂交选择(SHS)和基于微阵列的基因组选择(MGS)。使用 HapMap DNA 样本,我们比较了这三种方法在捕获一组相同的外显子和与 528 个基因(2.61Mb)相关的进化保守区域方面的能力。为了进行序列分析,我们开发并使用了一种新颖的贝叶斯基因型分配算法,最可能基因型(MPG)。所有三种捕获方法都很有效,但灵敏度(与高质量基因型相关的靶向碱基百分比)因等效量的过滤序列而有所不同:例如,400Mb 时为 70%(MIP)、84%(SHS)和 91%(MGS)。相比之下,当与 Infinium 1M SNP BeadChip 衍生的基因型进行比较时,所有方法的准确率都相似,>99.84%,当与 30 倍覆盖度的全基因组鸟枪法测序数据进行比较时,准确率>99.998%。我们还观察到所有三种方法的假阳性率都很低;三种捕获方法确定的杂合位置中,>99.57%与 1M SNP BeadChip 一致,>98.840%与全基因组鸟枪法数据一致。此外,我们还成功地使用 MGS 方法对 12 个 pooled 样本的基因组进行了富集。我们发现这三种基因组富集方法高度准确且实用,灵敏度与 30 倍覆盖度的全基因组鸟枪法数据相当。