• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型组学数据经验。

Big Omics Data Experience.

作者信息

Kovatch Patricia, Costa Anthony, Giles Zachary, Fluder Eugene, Cho Hyung Min, Mazurkova Svetlana

机构信息

Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029, 212-241-6500.

出版信息

SC Conf Proc. 2015 Nov;2015. doi: 10.1145/2807591.2807595.

DOI:10.1145/2807591.2807595
PMID:30788464
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6379072/
Abstract

As personalized medicine becomes more integrated into healthcare, the rate at which human genomes are being sequenced is rising quickly together with a concomitant acceleration in compute and storage requirements. To achieve the most effective solution for genomic workloads without re-architecting the industry-standard software, we performed a rigorous analysis of usage statistics, benchmarks and available technologies to design a system for maximum throughput. We share our experiences designing a system optimized for the "Genome Analysis ToolKit (GATK) Best Practices" whole genome DNA and RNA pipeline based on an evaluation of compute, workload and I/O characteristics. The characteristics of genomic-based workloads are vastly different from those of traditional HPC workloads, requiring different configurations of the scheduler and the I/O subsystem to achieve reliability, performance and scalability. By understanding how our researchers and clinicians work, we were able to employ techniques not only to speed up their workflow yielding improved and repeatable performance, but also to make more efficient use of storage and compute resources.

摘要

随着个性化医疗越来越融入医疗保健领域,人类基因组测序的速度迅速提高,同时对计算和存储的需求也在加速增长。为了在不重新构建行业标准软件的情况下实现针对基因组工作负载的最有效解决方案,我们对使用统计数据、基准测试和可用技术进行了严格分析,以设计一个实现最大吞吐量的系统。我们分享基于对计算、工作负载和I/O特性的评估,为“基因组分析工具包(GATK)最佳实践”全基因组DNA和RNA流程设计优化系统的经验。基于基因组的工作负载特性与传统高性能计算(HPC)工作负载的特性有很大不同,需要对调度器和I/O子系统进行不同的配置,以实现可靠性、性能和可扩展性。通过了解我们的研究人员和临床医生的工作方式,我们不仅能够采用技术来加速他们的工作流程,提高性能并使其可重复,还能更有效地利用存储和计算资源。

相似文献

1
Big Omics Data Experience.大型组学数据经验。
SC Conf Proc. 2015 Nov;2015. doi: 10.1145/2807591.2807595.
2
Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms.加速下一代测序数据分析:对基因组分析工具包算法优化最佳实践的评估
Genomics Inform. 2020 Mar;18(1):e10. doi: 10.5808/GI.2020.18.1.e10. Epub 2020 Mar 31.
3
Optimizing High-Performance Computing Systems for Biomedical Workloads.优化用于生物医学工作负载的高性能计算系统。
IEEE Int Symp Parallel Distrib Process Workshops Phd Forum. 2020 May;2020:183-192. doi: 10.1109/ipdpsw50202.2020.00040. Epub 2020 Jul 28.
4
The Flux Operator.通量算子。
F1000Res. 2024 Mar 21;13:203. doi: 10.12688/f1000research.147989.1. eCollection 2024.
5
Sentieon DNASeq Variant Calling Workflow Demonstrates Strong Computational Performance and Accuracy.Sentieon DNASeq变异检测工作流程展现出强大的计算性能和准确性。
Front Genet. 2019 Aug 20;10:736. doi: 10.3389/fgene.2019.00736. eCollection 2019.
6
Large-scale parallel genome assembler over cloud computing environment.基于云计算环境的大规模并行基因组组装器。
J Bioinform Comput Biol. 2017 Jun;15(3):1740003. doi: 10.1142/S0219720017400030. Epub 2017 May 23.
7
Closha: bioinformatics workflow system for the analysis of massive sequencing data.Closha:用于大规模测序数据分析的生物信息学工作流系统。
BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):43. doi: 10.1186/s12859-018-2019-3.
8
Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services.构建Globus基因组学的经验:一种使用Galaxy、Globus和亚马逊网络服务的下一代测序分析服务。
Concurr Comput. 2014 Sep 10;26(13):2266-2279. doi: 10.1002/cpe.3274.
9
Accelerating next generation sequencing data analysis with system level optimizations.利用系统级优化加速下一代测序数据分析。
Sci Rep. 2017 Aug 22;7(1):9058. doi: 10.1038/s41598-017-09089-1.
10
PGen: large-scale genomic variations analysis workflow and browser in SoyKB.PGen:大豆知识库中的大规模基因组变异分析工作流程与浏览器
BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):337. doi: 10.1186/s12859-016-1227-y.

引用本文的文献

1
Machine learning and deep learning to predict mortality in patients with spontaneous coronary artery dissection.机器学习和深度学习预测自发性冠状动脉夹层患者的死亡率。
Sci Rep. 2021 Apr 26;11(1):8992. doi: 10.1038/s41598-021-88172-0.
2
Optimizing High-Performance Computing Systems for Biomedical Workloads.优化用于生物医学工作负载的高性能计算系统。
IEEE Int Symp Parallel Distrib Process Workshops Phd Forum. 2020 May;2020:183-192. doi: 10.1109/ipdpsw50202.2020.00040. Epub 2020 Jul 28.

本文引用的文献

1
A case study for cloud based high throughput analysis of NGS data using the globus genomics system.一个使用Globus基因组学系统对NGS数据进行基于云的高通量分析的案例研究。
Comput Struct Biotechnol J. 2014 Nov 7;13:64-74. doi: 10.1016/j.csbj.2014.11.001. eCollection 2015.
2
Leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow.利用高性能计算的力量进行下一代测序数据分析:来自高通量外显子组工作流程的技巧与窍门
PLoS One. 2015 May 5;10(5):e0126321. doi: 10.1371/journal.pone.0126321. eCollection 2015.
3
From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.
从FastQ数据到高可信度变异检测:基因组分析工具包最佳实践流程
Curr Protoc Bioinformatics. 2013;43(1110):11.10.1-11.10.33. doi: 10.1002/0471250953.bi1110s43.
4
Synaptic, transcriptional and chromatin genes disrupted in autism.在自闭症中受到破坏的突触、转录和染色质基因。
Nature. 2014 Nov 13;515(7526):209-15. doi: 10.1038/nature13772. Epub 2014 Oct 29.
5
Most genetic risk for autism resides with common variation.大多数自闭症的遗传风险源于常见变异。
Nat Genet. 2014 Aug;46(8):881-5. doi: 10.1038/ng.3039. Epub 2014 Jul 20.
6
SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision.SparkSeq:一种快速、可扩展且适用于云环境的工具,可实现具有核苷酸精度的交互式基因组数据分析。
Bioinformatics. 2014 Sep 15;30(18):2652-3. doi: 10.1093/bioinformatics/btu343. Epub 2014 May 19.
7
Analytical validation of whole exome and whole genome sequencing for clinical applications.临床应用中外显子组和全基因组测序的分析验证。
BMC Med Genomics. 2014 Apr 23;7:20. doi: 10.1186/1755-8794-7-20.
8
SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads.SOAPdenovo-Trans:基于短 RNA-Seq 数据的 de novo 转录组组装。
Bioinformatics. 2014 Jun 15;30(12):1660-6. doi: 10.1093/bioinformatics/btu077. Epub 2014 Feb 13.
9
Supercomputing for the parallelization of whole genome analysis.超级计算在全基因组分析中的并行化。
Bioinformatics. 2014 Jun 1;30(11):1508-13. doi: 10.1093/bioinformatics/btu071. Epub 2014 Feb 12.
10
A polygenic burden of rare disruptive mutations in schizophrenia.精神分裂症中罕见的破坏性突变的多基因负担。
Nature. 2014 Feb 13;506(7487):185-90. doi: 10.1038/nature12975. Epub 2014 Jan 22.