• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BALSA:GPU 加速的全基因组和全外显子组测序的综合二次分析。

BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU.

机构信息

HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong Kong.

School of Science and Technology, The Open University of Hong Kong, Hong Kong.

出版信息

PeerJ. 2014 Jun 3;2:e421. doi: 10.7717/peerj.421. eCollection 2014.

DOI:10.7717/peerj.421
PMID:24949238
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4060040/
Abstract

This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

摘要

本文提出了一个名为 BALSA 的综合解决方案,用于下一代测序数据的二次分析;它利用 GPU 的计算能力和复杂的内存管理,实现了快速而准确的分析。从原始读取到变体(包括 SNPs 和 Indels),BALSA 在单个计算节点上仅使用一个商用 GPU 板,即可在 5.5 小时内处理 50 倍全基因组测序(约 7.5 亿个 100bp 配对末端读取),或者在 210 倍全外显子组测序中只需 25 分钟。BALSA 的速度源于其并行算法,这些算法有效地利用 GPU 来加速对齐、重对齐和统计测试等过程。BALSA 采用了 16 基因型模型,支持 SNPs 和 Indels 的调用,并在与六种流行的变异调用器的集合进行比较时,实现了具有竞争力的变异调用准确性和敏感性。BALSA 还支持有效的体细胞 SNV 和 CNV 鉴定;实验表明,BALSA 能够恢复所有先前验证的体细胞 SNV 和 CNV,并且在体细胞 Indel 检测方面更敏感。BALSA 以 VCF 格式输出变体。类似于堆积的 SNAPSHOT 格式,在保持与 BAM 相同的变体调用保真度的同时,实现了高效的存储和索引,并且促进了下游分析的应用程序开发。BALSA 可在以下网址获取:http://sourceforge.net/p/balsa。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa80/4060040/f6fd883459db/peerj-02-421-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa80/4060040/3a53af8a620f/peerj-02-421-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa80/4060040/c3dda740de73/peerj-02-421-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa80/4060040/8da0c268befb/peerj-02-421-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa80/4060040/dc2a2ce1b83a/peerj-02-421-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa80/4060040/2623d74b127e/peerj-02-421-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa80/4060040/f6fd883459db/peerj-02-421-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa80/4060040/3a53af8a620f/peerj-02-421-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa80/4060040/c3dda740de73/peerj-02-421-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa80/4060040/8da0c268befb/peerj-02-421-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa80/4060040/dc2a2ce1b83a/peerj-02-421-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa80/4060040/2623d74b127e/peerj-02-421-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa80/4060040/f6fd883459db/peerj-02-421-g006.jpg

相似文献

1
BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU.BALSA:GPU 加速的全基因组和全外显子组测序的综合二次分析。
PeerJ. 2014 Jun 3;2:e421. doi: 10.7717/peerj.421. eCollection 2014.
2
SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations.SNVSniffer:一种用于种系和体细胞单核苷酸及插入缺失突变的综合检测工具。
BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):47. doi: 10.1186/s12918-016-0300-5.
3
SeqMule: automated pipeline for analysis of human exome/genome sequencing data.SeqMule:用于分析人类外显子组/基因组测序数据的自动化流程
Sci Rep. 2015 Sep 18;5:14283. doi: 10.1038/srep14283.
4
Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing.多种变异calling 管道一致性低:外显子组和基因组测序的实际影响。
Genome Med. 2013 Mar 27;5(3):28. doi: 10.1186/gm432. eCollection 2013.
5
Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling.两种外显子捕获试剂盒和测序平台用于变异检测的比较与评估
BMC Genomics. 2015 Aug 5;16(1):581. doi: 10.1186/s12864-015-1796-6.
6
Impact of post-alignment processing in variant discovery from whole exome data.全外显子数据变异发现中比对后处理的影响
BMC Bioinformatics. 2016 Oct 3;17(1):403. doi: 10.1186/s12859-016-1279-z.
7
Evaluation of whole-genome sequencing of four Chinese crested dogs for variant detection using the ion proton system.使用离子质子系统对四只中国冠毛犬进行全基因组测序以检测变异的评估。
Canine Genet Epidemiol. 2015 Oct 8;2:16. doi: 10.1186/s40575-015-0029-2. eCollection 2015.
8
An integrative variant analysis suite for whole exome next-generation sequencing data.用于全外显子组下一代测序数据的综合变异分析套件。
BMC Bioinformatics. 2012 Jan 12;13:8. doi: 10.1186/1471-2105-13-8.
9
INDELseek: detection of complex insertions and deletions from next-generation sequencing data.INDELseek:从下一代测序数据中检测复杂插入和缺失
BMC Genomics. 2017 Jan 5;18(1):16. doi: 10.1186/s12864-016-3449-9.
10
SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing.SNooPer:一种基于机器学习从低深度下一代测序中识别体细胞变异的方法。
BMC Genomics. 2016 Nov 14;17(1):912. doi: 10.1186/s12864-016-3281-2.

引用本文的文献

1
From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures.从分子到基因组变异:通过智能算法和架构加速基因组分析
Comput Struct Biotechnol J. 2022 Aug 18;20:4579-4599. doi: 10.1016/j.csbj.2022.08.019. eCollection 2022.
2
HKG: an open genetic variant database of 205 Hong Kong cantonese exomes.HKG:一个包含205个香港粤语外显子组的开放遗传变异数据库。
NAR Genom Bioinform. 2022 Feb 8;4(1):lqac005. doi: 10.1093/nargab/lqac005. eCollection 2022 Mar.
3
elPrep 4: A multithreaded framework for sequence analysis.

本文引用的文献

1
Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline.将基因组学推向云端:下一代序列分析流水线 Mercury 的部署。
BMC Bioinformatics. 2014 Jan 29;15:30. doi: 10.1186/1471-2105-15-30.
2
SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner.SOAP3-dp:快速、准确、敏感的基于 GPU 的短读序列比对工具。
PLoS One. 2013 May 31;8(5):e65632. doi: 10.1371/journal.pone.0065632. Print 2013.
3
Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms.艾萨克:在 Illumina 测序平台上进行超快速全基因组二级分析。
elPrep 4:一个用于序列分析的多线程框架。
PLoS One. 2019 Feb 13;14(2):e0209523. doi: 10.1371/journal.pone.0209523. eCollection 2019.
4
GT-WGS: an efficient and economic tool for large-scale WGS analyses based on the AWS cloud service.GT-WGS:一种基于 AWS 云服务的高效、经济的大规模 WGS 分析工具。
BMC Genomics. 2018 Jan 19;19(Suppl 1):959. doi: 10.1186/s12864-017-4334-x.
5
16GT: a fast and sensitive variant caller using a 16-genotype probabilistic model.16GT:一种使用 16 种基因型概率模型的快速、灵敏的变异 caller。
Gigascience. 2017 Jul 1;6(7):1-4. doi: 10.1093/gigascience/gix045.
6
elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.elPrep:用于变异检测的序列比对/映射文件的高性能制备
PLoS One. 2015 Jul 16;10(7):e0132868. doi: 10.1371/journal.pone.0132868. eCollection 2015.
Bioinformatics. 2013 Aug 15;29(16):2041-3. doi: 10.1093/bioinformatics/btt314. Epub 2013 Jun 4.
4
SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.SOAPdenovo2:一种经验丰富的、内存效率高的短读长从头组装器。
Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.
5
Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples.检测不纯和异质癌症样本中的体细胞点突变。
Nat Biotechnol. 2013 Mar;31(3):213-9. doi: 10.1038/nbt.2514. Epub 2013 Feb 10.
6
Comprehensive genomic characterization of squamous cell lung cancers.全面基因组特征分析鳞状细胞肺癌
Nature. 2012 Sep 27;489(7417):519-25. doi: 10.1038/nature11404. Epub 2012 Sep 9.
7
pIRS: Profile-based Illumina pair-end reads simulator.pIRS:基于谱的 Illumina 双端读取模拟器。
Bioinformatics. 2012 Jun 1;28(11):1533-5. doi: 10.1093/bioinformatics/bts187. Epub 2012 Apr 15.
8
VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing.VarScan 2:通过外显子组测序发现癌症中的体细胞突变和拷贝数改变。
Genome Res. 2012 Mar;22(3):568-76. doi: 10.1101/gr.129684.111. Epub 2012 Feb 2.
9
An integrative variant analysis suite for whole exome next-generation sequencing data.用于全外显子组下一代测序数据的综合变异分析套件。
BMC Bioinformatics. 2012 Jan 12;13:8. doi: 10.1186/1471-2105-13-8.
10
SomaticSniper: identification of somatic point mutations in whole genome sequencing data.SomaticSniper:全基因组测序数据中体细胞点突变的识别。
Bioinformatics. 2012 Feb 1;28(3):311-7. doi: 10.1093/bioinformatics/btr665. Epub 2011 Dec 6.