Genome Informatics Section, Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
Genome Res. 2011 Sep;21(9):1498-505. doi: 10.1101/gr.123638.111. Epub 2011 Jul 19.
As whole-genome sequencing becomes commoditized and we begin to sequence and analyze personal genomes for clinical and diagnostic purposes, it is necessary to understand what constitutes a complete sequencing experiment for determining genotypes and detecting single-nucleotide variants. Here, we show that the current recommendation of ∼30× coverage is not adequate to produce genotype calls across a large fraction of the genome with acceptably low error rates. Our results are based on analyses of a clinical sample sequenced on two related Illumina platforms, GAII(x) and HiSeq 2000, to a very high depth (126×). We used these data to establish genotype-calling filters that dramatically increase accuracy. We also empirically determined how the callable portion of the genome varies as a function of the amount of sequence data used. These results help provide a "sequencing guide" for future whole-genome sequencing decisions and metrics by which coverage statistics should be reported.
随着全基因组测序变得普及,我们开始为临床和诊断目的对个人基因组进行测序和分析,因此有必要了解确定基因型和检测单核苷酸变异所需的完整测序实验的组成部分。在这里,我们表明,目前建议的约 30×覆盖范围不足以在具有可接受的低错误率的情况下产生基因组的大部分的基因型调用。我们的结果基于对在两种相关的 Illumina 平台(GAII(x)和 HiSeq 2000)上测序到非常高深度(126×)的临床样本的分析。我们使用这些数据来建立基因型调用过滤器,从而极大地提高了准确性。我们还通过使用的测序数据量来确定基因组的可调用部分随时间的变化。这些结果有助于为未来的全基因组测序决策提供“测序指南”,并为覆盖率统计数据的报告提供指标。