Wilkinson Sean R, Almeida Jonas S
Division of Informatics, Department of Pathology, University of Alabama at Birmingham, Birmingham, USA.
BMC Bioinformatics. 2014 Jun 9;15:176. doi: 10.1186/1471-2105-15-176.
Ongoing advancements in cloud computing provide novel opportunities in scientific computing, especially for distributed workflows. Modern web browsers can now be used as high-performance workstations for querying, processing, and visualizing genomics' "Big Data" from sources like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) without local software installation or configuration. The design of QMachine (QM) was driven by the opportunity to use this pervasive computing model in the context of the Web of Linked Data in Biomedicine.
QM is an open-sourced, publicly available web service that acts as a messaging system for posting tasks and retrieving results over HTTP. The illustrative application described here distributes the analyses of 20 Streptococcus pneumoniae genomes for shared suffixes. Because all analytical and data retrieval tasks are executed by volunteer machines, few server resources are required. Any modern web browser can submit those tasks and/or volunteer to execute them without installing any extra plugins or programs. A client library provides high-level distribution templates including MapReduce. This stark departure from the current reliance on expensive server hardware running "download and install" software has already gathered substantial community interest, as QM received more than 2.2 million API calls from 87 countries in 12 months.
QM was found adequate to deliver the sort of scalable bioinformatics solutions that computation- and data-intensive workflows require. Paradoxically, the sandboxed execution of code by web browsers was also found to enable them, as compute nodes, to address critical privacy concerns that characterize biomedical environments.
云计算的持续发展为科学计算带来了新机遇,特别是对于分布式工作流程而言。现代网页浏览器如今可被用作高性能工作站,用于查询、处理和可视化来自诸如癌症基因组图谱(TCGA)和国际癌症基因组联盟(ICGC)等来源的基因组“大数据”,而无需在本地安装或配置软件。QMachine(QM)的设计是受在生物医学关联数据网络背景下使用这种普及计算模型的机会所驱动。
QM是一个开源的、可公开获取的网络服务,它充当一个消息系统,用于通过HTTP发布任务和检索结果。此处描述的示例应用程序将20个肺炎链球菌基因组的共享后缀分析进行了分布式处理。由于所有分析和数据检索任务均由志愿机器执行,因此所需的服务器资源很少。任何现代网页浏览器都可以提交这些任务和/或志愿执行它们,而无需安装任何额外的插件或程序。一个客户端库提供了包括MapReduce在内的高级分布式模板。这种与当前依赖运行“下载并安装”软件的昂贵服务器硬件的明显不同,已经引起了社区的广泛关注,因为QM在12个月内收到了来自87个国家的超过220万次API调用。
发现QM足以提供计算和数据密集型工作流程所需的那种可扩展生物信息学解决方案。矛盾的是,还发现网页浏览器对代码的沙盒式执行使它们作为计算节点能够解决生物医学环境中特有的关键隐私问题。