Pfeifer S P
School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
Swiss Institute of Bioinformatics, Lausanne, Switzerland.
Heredity (Edinb). 2017 Feb;118(2):111-124. doi: 10.1038/hdy.2016.102. Epub 2016 Oct 19.
Sequencing has revolutionized biology by permitting the analysis of genomic variation at an unprecedented resolution. High-throughput sequencing is fast and inexpensive, making it accessible for a wide range of research topics. However, the produced data contain subtle but complex types of errors, biases and uncertainties that impose several statistical and computational challenges to the reliable detection of variants. To tap the full potential of high-throughput sequencing, a thorough understanding of the data produced as well as the available methodologies is required. Here, I review several commonly used methods for generating and processing next-generation resequencing data, discuss the influence of errors and biases together with their resulting implications for downstream analyses and provide general guidelines and recommendations for producing high-quality single-nucleotide polymorphism data sets from raw reads by highlighting several sophisticated reference-based methods representing the current state of the art.
测序技术通过以前所未有的分辨率对基因组变异进行分析,给生物学带来了革命性的变化。高通量测序速度快且成本低,使其适用于广泛的研究课题。然而,所产生的数据包含细微但复杂的错误、偏差和不确定性类型,这给可靠检测变异带来了若干统计和计算方面的挑战。为了充分挖掘高通量测序的潜力,需要深入了解所产生的数据以及可用的方法。在此,我回顾了几种用于生成和处理新一代重测序数据的常用方法,讨论了错误和偏差的影响及其对下游分析的影响,并通过强调几种代表当前技术水平的复杂的基于参考的方法,提供了从原始读数生成高质量单核苷酸多态性数据集的一般指南和建议。