Department of Computer Science, University of Torino, Torino, Italy.
Curr Top Med Chem. 2012;12(12):1320-30. doi: 10.2174/156802612801319007.
Next-generation sequencing (NGS) technologies are rapidly changing the approach to complex genomic studies, opening the way to personalized drugs development and personalized medicine. NGS technologies are characterized by a massive throughput for relatively short-sequences (30-100), and they are currently the most reliable and accurate method for grouping individuals on the basis of their genetic profiles. The first and crucial step in sequence analysis is the conversion of millions of short sequences (reads) into valuable genetic information by their mapping to a known (reference) genome. New computational methods, specifically designed for the type and the amount of data generated by NGS technologies, are replacing earlier widespread genome alignment algorithms which are unable to cope with such massive amount of data. This review provides an overview of the bioinformatics techniques that have been developed for the mapping of NGS data onto a reference genome, with a special focus on polymorphism rate and sequence error detection. The different techniques have been experimented on an appropriately defined dataset, to investigate their relative computational costs and usability, as seen from an user perspective. Since NGS platforms interrogate the genome using either the conventional nucleotide space or the more recent color space, this review does consider techniques both in nucleotide and color space, emphasizing similarities and diversities.
下一代测序(NGS)技术正在迅速改变复杂基因组研究的方法,为个性化药物开发和个性化医学开辟了道路。NGS 技术的特点是相对较短序列(30-100)的大量吞吐量,并且它们是目前最可靠和准确的方法,可根据个体的遗传特征对其进行分组。序列分析的第一步和关键步骤是通过将数百万个短序列(读取)映射到已知(参考)基因组,将其转换为有价值的遗传信息。专门为 NGS 技术生成的类型和数量的数据设计的新计算方法正在取代早期广泛使用的无法处理如此大量数据的基因组比对算法。本综述概述了为将 NGS 数据映射到参考基因组而开发的生物信息学技术,特别关注多态性率和序列错误检测。已经在适当定义的数据集上对不同的技术进行了实验,以从用户的角度调查它们的相对计算成本和可用性。由于 NGS 平台使用常规核苷酸空间或最近的颜色空间来检测基因组,因此本综述不仅考虑了核苷酸空间中的技术,还考虑了颜色空间中的技术,强调了相似性和多样性。