HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong Kong.
School of Science and Technology, The Open University of Hong Kong, Hong Kong.
PeerJ. 2014 Jun 3;2:e421. doi: 10.7717/peerj.421. eCollection 2014.
This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.
本文提出了一个名为 BALSA 的综合解决方案,用于下一代测序数据的二次分析;它利用 GPU 的计算能力和复杂的内存管理,实现了快速而准确的分析。从原始读取到变体(包括 SNPs 和 Indels),BALSA 在单个计算节点上仅使用一个商用 GPU 板,即可在 5.5 小时内处理 50 倍全基因组测序(约 7.5 亿个 100bp 配对末端读取),或者在 210 倍全外显子组测序中只需 25 分钟。BALSA 的速度源于其并行算法,这些算法有效地利用 GPU 来加速对齐、重对齐和统计测试等过程。BALSA 采用了 16 基因型模型,支持 SNPs 和 Indels 的调用,并在与六种流行的变异调用器的集合进行比较时,实现了具有竞争力的变异调用准确性和敏感性。BALSA 还支持有效的体细胞 SNV 和 CNV 鉴定;实验表明,BALSA 能够恢复所有先前验证的体细胞 SNV 和 CNV,并且在体细胞 Indel 检测方面更敏感。BALSA 以 VCF 格式输出变体。类似于堆积的 SNAPSHOT 格式,在保持与 BAM 相同的变体调用保真度的同时,实现了高效的存储和索引,并且促进了下游分析的应用程序开发。BALSA 可在以下网址获取:http://sourceforge.net/p/balsa。