Suppr超能文献

使用WASP的爱因斯坦基因组网关——面向极端科学与工程发现环境(XSEDE)的高通量多层生命科学门户。

The Einstein Genome Gateway using WASP - a high throughput multi-layered life sciences portal for XSEDE.

作者信息

Golden Aaron, McLellan Andrew S, Dubin Robert A, Jing Qiang, O Broin Pilib, Moskowitz David, Zhang Zhengdong, Suzuki Masako, Hargitai Joseph, Calder R Brent, Greally John M

机构信息

Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA.

出版信息

Stud Health Technol Inform. 2012;175:182-91.

Abstract

Massively-parallel sequencing (MPS) technologies and their diverse applications in genomics and epigenomics research have yielded enormous new insights into the physiology and pathophysiology of the human genome. The biggest hurdle remains the magnitude and diversity of the datasets generated, compromising our ability to manage, organize, process and ultimately analyse data. The Wiki-based Automated Sequence Processor (WASP), developed at the Albert Einstein College of Medicine (hereafter Einstein), uniquely manages to tightly couple the sequencing platform, the sequencing assay, sample metadata and the automated workflows deployed on a heterogeneous high performance computing cluster infrastructure that yield sequenced, quality-controlled and 'mapped' sequence data, all within the one operating environment accessible by a web-based GUI interface. WASP at Einstein processes 4-6 TB of data per week and since its production cycle commenced it has processed ~ 1 PB of data overall and has revolutionized user interactivity with these new genomic technologies, who remain blissfully unaware of the data storage, management and most importantly processing services they request. The abstraction of such computational complexity for the user in effect makes WASP an ideal middleware solution, and an appropriate basis for the development of a grid-enabled resource - the Einstein Genome Gateway - as part of the Extreme Science and Engineering Discovery Environment (XSEDE) program. In this paper we discuss the existing WASP system, its proposed middleware role, and its planned interaction with XSEDE to form the Einstein Genome Gateway.

摘要

大规模平行测序(MPS)技术及其在基因组学和表观基因组学研究中的多样应用,已为人类基因组的生理学和病理生理学带来了大量新见解。最大的障碍仍然是所生成数据集的规模和多样性,这损害了我们管理、组织、处理并最终分析数据的能力。由阿尔伯特爱因斯坦医学院(以下简称爱因斯坦医学院)开发的基于维基的自动序列处理器(WASP),独特之处在于能够紧密结合测序平台、测序分析、样本元数据以及部署在异构高性能计算集群基础设施上的自动工作流程,从而在一个可通过基于网络的图形用户界面访问的操作环境中生成已测序、经过质量控制且“映射”好的序列数据。爱因斯坦医学院的WASP每周处理4 - 6 TB的数据,自其投入使用以来,总体已处理了约1 PB的数据,并且彻底改变了用户与这些新基因组技术的交互方式,而用户仍然全然不知他们所请求的数据存储、管理以及最重要的处理服务。实际上,为用户抽象出这种计算复杂性使WASP成为理想的中间件解决方案,也是开发作为极端科学与工程发现环境(XSEDE)计划一部分的支持网格资源——爱因斯坦基因组网关的合适基础。在本文中,我们将讨论现有的WASP系统、其拟议的中间件角色以及它与XSEDE形成爱因斯坦基因组网关的计划交互。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验