Vesalius Research Center, Vlaams Instituut voor Biotechnologie (VIB), Leuven, Belgium.
Nat Biotechnol. 2011 Dec 18;30(1):61-8. doi: 10.1038/nbt.2053.
Distinguishing single-nucleotide variants (SNVs) from errors in whole-genome sequences remains challenging. Here we describe a set of filters, together with a freely accessible software tool, that selectively reduce error rates and thereby facilitate variant detection in data from two short-read sequencing technologies, Complete Genomics and Illumina. By sequencing the nearly identical genomes from monozygotic twins and considering shared SNVs as 'true variants' and discordant SNVs as 'errors', we optimized thresholds for 12 individual filters and assessed which of the 1,048 filter combinations were effective in terms of sensitivity and specificity. Cumulative application of all effective filters reduced the error rate by 290-fold, facilitating the identification of genetic differences between monozygotic twins. We also applied an adapted, less stringent set of filters to reliably identify somatic mutations in a highly rearranged tumor and to identify variants in the NA19240 HapMap genome relative to a reference set of SNVs.
区分单核苷酸变异 (SNVs) 和全基因组序列中的错误仍然具有挑战性。在这里,我们描述了一组过滤器,以及一个免费的可用软件工具,该工具可选择性地降低错误率,从而促进来自两种短读长测序技术(Complete Genomics 和 Illumina)的数据中的变异检测。通过对同卵双胞胎的几乎相同的基因组进行测序,并将共享的 SNVs 视为“真正的变异”,将不一致的 SNVs 视为“错误”,我们针对 12 个单独的过滤器优化了阈值,并评估了 1,048 种过滤器组合中的哪些在灵敏度和特异性方面有效。所有有效过滤器的累积应用将错误率降低了 290 倍,有助于识别同卵双胞胎之间的遗传差异。我们还应用了一组经过改编的、不那么严格的过滤器,以可靠地识别高度重排肿瘤中的体细胞突变,并识别相对于参考 SNV 集的 NA19240 HapMap 基因组中的变体。