IEEE/ACM Trans Comput Biol Bioinform. 2017 Nov-Dec;14(6):1202-1213. doi: 10.1109/TCBB.2016.2586070. Epub 2016 Jun 29.
Computational genomics is an emerging field that is enabling us to reveal the origins of life and the genetic basis of diseases such as cancer. Next Generation Sequencing (NGS) technologies have unleashed a wealth of genomic information by producing immense amounts of raw data. Before any functional analysis can be applied to this data, read alignment is applied to find the genomic coordinates of the produced sequences. Alignment algorithms have evolved rapidly with the advancement in sequencing technology, striving to achieve biological accuracy at the expense of increasing space and time complexities. Hardware approaches have been proposed to accelerate the computational bottlenecks created by the alignment process. Although several hardware approaches have achieved remarkable speedups, most have overlooked important biological features, which have hampered their widespread adoption by the genomics community. In this paper, we provide a brief biological introduction to genomics and NGS. We discuss the most popular next generation read alignment tools and algorithms. Furthermore, we provide a comprehensive survey of the hardware implementations used to accelerate these algorithms.
计算基因组学是一个新兴领域,使我们能够揭示生命的起源和癌症等疾病的遗传基础。下一代测序(NGS)技术通过产生大量原始数据,释放了大量的基因组信息。在对这些数据进行任何功能分析之前,需要进行读取比对,以找到产生序列的基因组坐标。随着测序技术的进步,比对算法也迅速发展,在提高空间和时间复杂度的代价下,努力实现生物准确性。已经提出了硬件方法来加速比对过程中产生的计算瓶颈。尽管几种硬件方法已经实现了显著的加速,但大多数方法都忽略了重要的生物学特征,这阻碍了它们在基因组学界的广泛采用。在本文中,我们提供了基因组学和 NGS 的简要生物学介绍。我们讨论了最流行的下一代读取比对工具和算法。此外,我们还对用于加速这些算法的硬件实现进行了全面调查。