Suppr超能文献

超快速且精确的映射算法FANSe3:在30分钟内完成人类全基因组测序数据集的映射

The Ultrafast and Accurate Mapping Algorithm FANSe3: Mapping a Human Whole-Genome Sequencing Dataset Within 30 Minutes.

作者信息

Zhang Gong, Zhang Yongjian, Jin Jingjie

机构信息

MOE Key Laboratory of Tumor Molecular Biology and Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, 510632 China.

Chi-Biotech Co. Ltd., Shenzhen, 518000 China.

出版信息

Phenomics. 2021 Feb 22;1(1):22-30. doi: 10.1007/s43657-020-00008-5. eCollection 2021 Feb.

Abstract

Aligning billions of reads generated by the next-generation sequencing (NGS) to reference sequences, termed "mapping", is the time-consuming and computationally-intensive process in most NGS applications. A Fast, accurate and robust mapping algorithm is highly needed. Therefore, we developed the FANSe3 mapping algorithm, which can map a 30 × human whole-genome sequencing (WGS) dataset within 30 min, a 50 × human whole exome sequencing (WES) dataset within 30 s, and a typical mRNA-seq dataset within seconds in a single-server node without the need for any hardware acceleration feature. Like its predecessor FANSe2, the error rate of FANSe3 can be kept as low as 10 in most cases, this is more robust than the Burrows-Wheeler transform-based algorithms. Error allowance hardly affected the identification of a driver somatic mutation in clinically relevant WGS data and provided robust gene expression profiles regardless of the parameter settings and sequencer used. The novel algorithm, designed for high-performance cloud-computing after infrastructures, will break the bottleneck of speed and accuracy in NGS data analysis and promote NGS applications in various fields. The FANSe3 algorithm can be downloaded from the website: http://www.chi-biotech.com/fanse3/.

摘要

将下一代测序(NGS)产生的数十亿条 reads 与参考序列进行比对,即所谓的“映射”,在大多数 NGS 应用中是耗时且计算密集的过程。因此,迫切需要一种快速、准确且稳健的映射算法。为此,我们开发了 FANSe3 映射算法,该算法能够在单服务器节点上,在 30 分钟内映射一个 30×的人类全基因组测序(WGS)数据集,在 30 秒内映射一个 50×的人类全外显子组测序(WES)数据集,并在数秒内映射一个典型的 mRNA-seq 数据集,且无需任何硬件加速功能。与它的前身 FANSe2 一样,在大多数情况下,FANSe3 的错误率可保持低至十万分之一,这比基于 Burrows-Wheeler 变换的算法更稳健。在临床相关的 WGS 数据中,错误容限几乎不影响驱动体细胞突变的识别,并且无论参数设置和使用的测序仪如何,都能提供稳健的基因表达谱。这种为高性能云计算基础设施设计的新算法,将打破 NGS 数据分析中速度和准确性的瓶颈,并推动 NGS 在各个领域的应用。FANSe3 算法可从网站:http://www.chi-biotech.com/fanse3/ 下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa97/9584123/2c4075bb9a5b/43657_2020_8_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验