• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Bionimbus:用于管理、分析和共享大型基因组数据集的云平台。

Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.

作者信息

Heath Allison P, Greenway Matthew, Powell Raymond, Spring Jonathan, Suarez Rafael, Hanley David, Bandlamudi Chai, McNerney Megan E, White Kevin P, Grossman Robert L

机构信息

Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois, USA.

Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois, USA Department of Pathology, University of Chicago, Chicago, Illinois, USA.

出版信息

J Am Med Inform Assoc. 2014 Nov-Dec;21(6):969-75. doi: 10.1136/amiajnl-2013-002155. Epub 2014 Jan 24.

DOI:10.1136/amiajnl-2013-002155
PMID:24464852
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4215034/
Abstract

BACKGROUND

As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it.

METHODS

Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required.

RESULTS

Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample.

CONCLUSIONS

Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics.

摘要

背景

随着大型基因组学和表型数据集越来越普遍,大多数研究人员获取、管理和分析这些数据集变得越来越困难。一种可能的方法是为研究界提供几个包含这些数据的PB级基于云计算的平台,以及用于分析数据的工具和资源。

方法

Bionimbus是一个开源云计算平台,主要基于OpenStack(管理按需提供所需计算资源的虚拟机)和GlusterFS(一种高性能集群文件系统)。Bionimbus还包括Tukey(一个门户)和相关的中间件,它为各种Bionimbus资源提供单一入口点和单点登录;以及Yates,它能自动安装、配置和维护所需的软件基础设施。

结果

Bionimbus被多个项目用于处理基因组学和表型数据。例如,芝加哥大学的一个急性髓系白血病重测序项目使用了它。该项目需要几个计算流程,包括质量控制、比对、变异检测和注释流程。对于每个样本,比对步骤需要8个CPU运行约12小时。每个样本的BAM文件大小在5GB到10GB之间。

结论

研究界的大多数成员在下载大型基因组学数据集以及获得足够的存储和计算机资源来管理和分析数据方面存在困难。像Bionimbus这样拥有包含大型基因组学数据集的数据共享库的云计算平台,是拓宽基因组学研究数据获取途径的一种选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3e0/4215034/679a09ec0a76/amiajnl-2013-002155f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3e0/4215034/f123f732bf90/amiajnl-2013-002155f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3e0/4215034/efdcdf50a266/amiajnl-2013-002155f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3e0/4215034/51e7d0703a38/amiajnl-2013-002155f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3e0/4215034/679a09ec0a76/amiajnl-2013-002155f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3e0/4215034/f123f732bf90/amiajnl-2013-002155f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3e0/4215034/efdcdf50a266/amiajnl-2013-002155f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3e0/4215034/51e7d0703a38/amiajnl-2013-002155f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3e0/4215034/679a09ec0a76/amiajnl-2013-002155f04.jpg

相似文献

1
Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.Bionimbus:用于管理、分析和共享大型基因组数据集的云平台。
J Am Med Inform Assoc. 2014 Nov-Dec;21(6):969-75. doi: 10.1136/amiajnl-2013-002155. Epub 2014 Jan 24.
2
Data Lakes, Clouds, and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data.数据湖、云与公共数据池:基因组数据分析与共享平台综述
Trends Genet. 2019 Mar;35(3):223-234. doi: 10.1016/j.tig.2018.12.006. Epub 2019 Jan 25.
3
Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.基因组学虚拟实验室:面向云端的实用生物信息学工作台。
PLoS One. 2015 Oct 26;10(10):e0140829. doi: 10.1371/journal.pone.0140829. eCollection 2015.
4
CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing.CloVR:一种虚拟机,用于在桌面环境下通过云计算实现自动化和可移植的序列分析。
BMC Bioinformatics. 2011 Aug 30;12:356. doi: 10.1186/1471-2105-12-356.
5
Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community.云生物 Linux:为基因组学社区提供预配置和按需生物信息学计算。
BMC Bioinformatics. 2012 Mar 19;13:42. doi: 10.1186/1471-2105-13-42.
6
The ISB Cancer Genomics Cloud: A Flexible Cloud-Based Platform for Cancer Genomics Research.国际生物信息学研究所癌症基因组云平台:一个用于癌症基因组学研究的灵活的基于云的平台。
Cancer Res. 2017 Nov 1;77(21):e7-e10. doi: 10.1158/0008-5472.CAN-17-0617.
7
Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.Rainbow:一种使用云计算进行大规模全基因组测序数据分析的工具。
BMC Genomics. 2013 Jun 27;14:425. doi: 10.1186/1471-2164-14-425.
8
Closha 2.0: a bio-workflow design system for massive genome data analysis on high performance cluster infrastructure.Closha 2.0:一个用于高性能集群基础设施上大规模基因组数据分析的生物工作流设计系统。
BMC Bioinformatics. 2024 Nov 12;25(1):353. doi: 10.1186/s12859-024-05963-8.
9
WeBrain: A web-based brainformatics platform of computational ecosystem for EEG big data analysis.WeBrain:一个基于网络的脑信息学平台,是一个用于 EEG 大数据分析的计算生态系统。
Neuroimage. 2021 Dec 15;245:118713. doi: 10.1016/j.neuroimage.2021.118713. Epub 2021 Nov 17.
10
MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants.MC-GenomeKey:一种用于检测和注释基因组变异的多云系统。
BMC Bioinformatics. 2017 Jan 20;18(1):49. doi: 10.1186/s12859-016-1454-2.

引用本文的文献

1
Review of open-source software for developing heterogeneous data management systems for bioinformatics applications.用于生物信息学应用开发异构数据管理系统的开源软件综述。
Bioinform Adv. 2025 Jul 18;5(1):vbaf168. doi: 10.1093/bioadv/vbaf168. eCollection 2025.
2
Unraveling the role of cloud computing in health care system and biomedical sciences.解析云计算在医疗保健系统和生物医学科学中的作用。
Heliyon. 2024 Apr 2;10(7):e29044. doi: 10.1016/j.heliyon.2024.e29044. eCollection 2024 Apr 15.
3
A comprehensive multi-omics analysis reveals molecular features associated with cancer via RNA cross-talks in the Notch signaling pathway.

本文引用的文献

1
Metcalfe's law and the biology information commons.梅特卡夫定律与生物信息共享空间
Nat Biotechnol. 2013 Apr;31(4):297-303. doi: 10.1038/nbt.2555.
2
Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples.检测不纯和异质癌症样本中的体细胞点突变。
Nat Biotechnol. 2013 Mar;31(3):213-9. doi: 10.1038/nbt.2514. Epub 2013 Feb 10.
3
CUX1 is a haploinsufficient tumor suppressor gene on chromosome 7 frequently inactivated in acute myeloid leukemia.CUX1 是 7 号染色体上的一个杂合性缺失肿瘤抑制基因,在急性髓系白血病中经常失活。
一项全面的多组学分析通过Notch信号通路中的RNA相互作用揭示了与癌症相关的分子特征。
Comput Struct Biotechnol J. 2022 Jul 26;20:3972-3985. doi: 10.1016/j.csbj.2022.07.036. eCollection 2022.
4
Data Integration Challenges for Machine Learning in Precision Medicine.精准医学中机器学习的数据整合挑战
Front Med (Lausanne). 2022 Jan 25;8:784455. doi: 10.3389/fmed.2021.784455. eCollection 2021.
5
Enhancing PCORnet Clinical Research Network data completeness by integrating multistate insurance claims with electronic health records in a cloud environment aligned with CMS security and privacy requirements.在符合 CMS 安全和隐私要求的云环境中,通过将多州保险索赔与电子健康记录相集成,提高 PCORnet 临床研究网络数据的完整性。
J Am Med Inform Assoc. 2022 Mar 15;29(4):660-670. doi: 10.1093/jamia/ocab269.
6
Cloud Computing Enabled Big Multi-Omics Data Analytics.基于云计算的大型多组学数据分析
Bioinform Biol Insights. 2021 Jul 28;15:11779322211035921. doi: 10.1177/11779322211035921. eCollection 2021.
7
Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services.全基因组关联研究在谷歌云平台和亚马逊网络服务上的可扩展性和成本效益分析。
J Am Med Inform Assoc. 2020 Sep 1;27(9):1425-1430. doi: 10.1093/jamia/ocaa068.
8
NRXN1 is associated with enlargement of the temporal horns of the lateral ventricles in psychosis.NRXN1 与精神病患者侧脑室颞角扩大有关。
Transl Psychiatry. 2019 Sep 17;9(1):230. doi: 10.1038/s41398-019-0564-9.
9
Shared and distinct genetic risk factors for childhood-onset and adult-onset asthma: genome-wide and transcriptome-wide studies.儿童期和成人期哮喘的共享和独特遗传风险因素:全基因组和转录组研究。
Lancet Respir Med. 2019 Jun;7(6):509-522. doi: 10.1016/S2213-2600(19)30055-4. Epub 2019 Apr 27.
10
Data Lakes, Clouds, and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data.数据湖、云与公共数据池:基因组数据分析与共享平台综述
Trends Genet. 2019 Mar;35(3):223-234. doi: 10.1016/j.tig.2018.12.006. Epub 2019 Jan 25.
Blood. 2013 Feb 7;121(6):975-83. doi: 10.1182/blood-2012-04-426965. Epub 2012 Dec 3.
4
Comprehensive molecular portraits of human breast tumours.人类乳腺肿瘤的全面分子特征图谱。
Nature. 2012 Oct 4;490(7418):61-70. doi: 10.1038/nature11412. Epub 2012 Sep 23.
5
An integrated encyclopedia of DNA elements in the human genome.人类基因组中 DNA 元件的综合百科全书。
Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.
6
An integrative probabilistic model for identification of structural variation in sequencing data.一种整合概率模型,用于鉴定测序数据中的结构变异。
Genome Biol. 2012;13(3):R22. doi: 10.1186/gb-2012-13-3-r22.
7
VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing.VarScan 2:通过外显子组测序发现癌症中的体细胞突变和拷贝数改变。
Genome Res. 2012 Mar;22(3):568-76. doi: 10.1101/gr.129684.111. Epub 2012 Feb 2.
8
A vision for a biomedical cloud.生物医学云的愿景。
J Intern Med. 2012 Feb;271(2):122-30. doi: 10.1111/j.1365-2796.2011.02491.x.
9
The role of cloud computing in managing the deluge of potentially private genetic data.云计算在管理大量潜在私密基因数据方面的作用。
Am J Bioeth. 2011 Nov;11(11):39-41. doi: 10.1080/15265161.2011.608242.
10
Integrated genomic analyses of ovarian carcinoma.卵巢癌的综合基因组分析。
Nature. 2011 Jun 29;474(7353):609-15. doi: 10.1038/nature10166.