Zhang Gong, Zhang Yongjian, Jin Jingjie
MOE Key Laboratory of Tumor Molecular Biology and Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, 510632 China.
Chi-Biotech Co. Ltd., Shenzhen, 518000 China.
Phenomics. 2021 Feb 22;1(1):22-30. doi: 10.1007/s43657-020-00008-5. eCollection 2021 Feb.
Aligning billions of reads generated by the next-generation sequencing (NGS) to reference sequences, termed "mapping", is the time-consuming and computationally-intensive process in most NGS applications. A Fast, accurate and robust mapping algorithm is highly needed. Therefore, we developed the FANSe3 mapping algorithm, which can map a 30 × human whole-genome sequencing (WGS) dataset within 30 min, a 50 × human whole exome sequencing (WES) dataset within 30 s, and a typical mRNA-seq dataset within seconds in a single-server node without the need for any hardware acceleration feature. Like its predecessor FANSe2, the error rate of FANSe3 can be kept as low as 10 in most cases, this is more robust than the Burrows-Wheeler transform-based algorithms. Error allowance hardly affected the identification of a driver somatic mutation in clinically relevant WGS data and provided robust gene expression profiles regardless of the parameter settings and sequencer used. The novel algorithm, designed for high-performance cloud-computing after infrastructures, will break the bottleneck of speed and accuracy in NGS data analysis and promote NGS applications in various fields. The FANSe3 algorithm can be downloaded from the website: http://www.chi-biotech.com/fanse3/.
将下一代测序(NGS)产生的数十亿条 reads 与参考序列进行比对,即所谓的“映射”,在大多数 NGS 应用中是耗时且计算密集的过程。因此,迫切需要一种快速、准确且稳健的映射算法。为此,我们开发了 FANSe3 映射算法,该算法能够在单服务器节点上,在 30 分钟内映射一个 30×的人类全基因组测序(WGS)数据集,在 30 秒内映射一个 50×的人类全外显子组测序(WES)数据集,并在数秒内映射一个典型的 mRNA-seq 数据集,且无需任何硬件加速功能。与它的前身 FANSe2 一样,在大多数情况下,FANSe3 的错误率可保持低至十万分之一,这比基于 Burrows-Wheeler 变换的算法更稳健。在临床相关的 WGS 数据中,错误容限几乎不影响驱动体细胞突变的识别,并且无论参数设置和使用的测序仪如何,都能提供稳健的基因表达谱。这种为高性能云计算基础设施设计的新算法,将打破 NGS 数据分析中速度和准确性的瓶颈,并推动 NGS 在各个领域的应用。FANSe3 算法可从网站:http://www.chi-biotech.com/fanse3/ 下载。