Suppr超能文献

FANSe2:一种用于定量下一代测序应用的强大且经济高效的比对工具。

FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications.

作者信息

Xiao Chuan-Le, Mai Zhi-Biao, Lian Xin-Lei, Zhong Jia-Yong, Jin Jing-Jie, He Qing-Yu, Zhang Gong

机构信息

Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China.

出版信息

PLoS One. 2014 Apr 17;9(4):e94250. doi: 10.1371/journal.pone.0094250. eCollection 2014.

Abstract

Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/.

摘要

深度测序数据的正确且无偏差解读不可避免地依赖于所有可映射读段到参考序列的完整映射,特别是对于定量RNA测序应用。基于种子的算法通常速度慢但稳健,而基于Burrows-Wheeler变换(BWT)的算法速度快但稳健性较差。为了兼具两者的优点,我们基于对实际测序错误分布的统计,开发了一种具有迭代映射策略的算法FANSe2,以在不影响准确性的情况下大幅加速映射。在使用原核和真核测序数据集的测试中,其灵敏度和准确性高于基于BWT的算法。FANSe2的基因识别结果经过实验验证,而之前的算法存在假阳性和假阴性。在基因表达定量方面,FANSe2与微阵列的一致性比大多数其他算法显著更好。我们实现了一种可扩展且几乎无需维护的并行化方法,该方法可以利用多台办公计算机的计算能力,这是任何其他主流算法都没有的新特性。使用三台普通办公计算机,我们证明FANSe2能在4.1小时内将来自整个Illunima HiSeq 2000流动槽(8个泳道,6.08亿读段)生成的RNA测序数据集映射到掩码人类基因组,其灵敏度高于Bowtie/Bowtie2。因此,FANSe2提供了稳健的准确性、对插入缺失的完全灵敏度、快速速度、通用兼容性和经济的计算利用率,使其成为深度测序应用中一个有用且实用的工具。FANSe2可在http://bioinformatics.jnu.edu.cn/software/fanse2/免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a521/3990525/d2b4f31fa443/pone.0094250.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验