使用 Illumina 数据进行低深度覆盖的高质量 SNP 调用。

High quality SNP calling using Illumina data at shallow coverage.

机构信息

Genome Sciences Centre, BC Cancer Agency, Vancouver BC, Canada.

出版信息

Bioinformatics. 2010 Apr 15;26(8):1029-35. doi: 10.1093/bioinformatics/btq092. Epub 2010 Feb 26.

PMID:20190250

Abstract

Detection of single nucleotide polymorphisms (SNPs) has been a major application in processing second generation sequencing (SGS) data. In principle, SNPs are called on single base differences between a reference genome and a sequence generated from SGS short reads of a sample genome. However, this exercise is far from trivial; several parameters related to sequencing quality, and/or reference genome properties, play essential effect on the accuracy of called SNPs especially at shallow coverage data. In this work, we present Slider II, an alignment and SNP calling approach that demonstrates improved algorithmic approaches enabling larger number of called SNPs with lower false positive rate. In addition to the regular alignment and SNP calling, as an optional feature, Slider II is capable of utilizing information about known SNPs of a target genome, as priors, in the alignment and SNPs calling to enhance it's capability of detecting these known SNPs and novel SNPs and mutations in their vicinity.

摘要

单核苷酸多态性（SNP）的检测一直是第二代测序（SGS）数据处理的主要应用之一。原则上，SNP 是基于参考基因组与从样本基因组的 SGS 短读序列生成的序列之间的单个碱基差异来调用的。然而，这一过程远非微不足道；与测序质量和/或参考基因组特性相关的几个参数对调用 SNP 的准确性，尤其是在浅覆盖数据方面，起着至关重要的作用。在这项工作中，我们提出了 Slider II，这是一种对齐和 SNP 调用方法，展示了改进的算法方法，能够以更低的假阳性率调用更多的 SNP。除了常规的对齐和 SNP 调用之外，作为一个可选特性，Slider II 能够利用目标基因组中已知 SNP 的信息，作为先验知识，在对齐和 SNP 调用中加以利用，以增强其检测这些已知 SNP 以及其附近的新 SNP 和突变的能力。