Suppr超能文献

下一代测序数据的比对和单核苷酸多态性(SNP)检测算法综述。

Review of alignment and SNP calling algorithms for next-generation sequencing data.

作者信息

Mielczarek M, Szyda J

机构信息

Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kożuchowska 7, 51-631, Wroclaw, Poland.

出版信息

J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9.

Abstract

Application of the massive parallel sequencing technology has become one of the most important issues in life sciences. Therefore, it was crucial to develop bioinformatics tools for next-generation sequencing (NGS) data processing. Currently, two of the most significant tasks include alignment to a reference genome and detection of single nucleotide polymorphisms (SNPs). In many types of genomic analyses, great numbers of reads need to be mapped to the reference genome; therefore, selection of the aligner is an essential step in NGS pipelines. Two main algorithms-suffix tries and hash tables-have been introduced for this purpose. Suffix array-based aligners are memory-efficient and work faster than hash-based aligners, but they are less accurate. In contrast, hash table algorithms tend to be slower, but more sensitive. SNP and genotype callers may also be divided into two main different approaches: heuristic and probabilistic methods. A variety of software has been subsequently developed over the past several years. In this paper, we briefly review the current development of NGS data processing algorithms and present the available software.

摘要

大规模平行测序技术的应用已成为生命科学中最重要的问题之一。因此,开发用于下一代测序(NGS)数据处理的生物信息学工具至关重要。目前,两项最重要的任务包括与参考基因组比对以及单核苷酸多态性(SNP)检测。在许多类型的基因组分析中,大量的读段需要映射到参考基因组;因此,选择比对器是NGS流程中的关键步骤。为此引入了两种主要算法——后缀树和哈希表。基于后缀数组的比对器内存效率高,比基于哈希的比对器工作速度更快,但准确性较低。相比之下,哈希表算法往往较慢,但更灵敏。SNP和基因型调用程序也可分为两种主要不同方法:启发式方法和概率方法。在过去几年中随后开发了各种各样的软件。在本文中,我们简要回顾了NGS数据处理算法的当前发展情况,并介绍了可用软件。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验