Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, United States of America.
PLoS Comput Biol. 2011 Aug;7(8):e1002147. doi: 10.1371/journal.pcbi.1002147. Epub 2011 Aug 25.
In this overview to biomedical computing in the cloud, we discussed two primary ways to use the cloud (a single instance or cluster), provided a detailed example using NGS mapping, and highlighted the associated costs. While many users new to the cloud may assume that entry is as straightforward as uploading an application and selecting an instance type and storage options, we illustrated that there is substantial up-front effort required before an application can make full use of the cloud's vast resources. Our intention was to provide a set of best practices and to illustrate how those apply to a typical application pipeline for biomedical informatics, but also general enough for extrapolation to other types of computational problems. Our mapping example was intended to illustrate how to develop a scalable project and not to compare and contrast alignment algorithms for read mapping and genome assembly. Indeed, with a newer aligner such as Bowtie, it is possible to map the entire African genome using one m2.2xlarge instance in 48 hours for a total cost of approximately $48 in computation time. In our example, we were not concerned with data transfer rates, which are heavily influenced by the amount of available bandwidth, connection latency, and network availability. When transferring large amounts of data to the cloud, bandwidth limitations can be a major bottleneck, and in some cases it is more efficient to simply mail a storage device containing the data to AWS (http://aws.amazon.com/importexport/). More information about cloud computing, detailed cost analysis, and security can be found in references.
在本次云生物医学计算概述中,我们讨论了两种主要的云计算使用方式(单实例或集群),通过 NGS 映射提供了详细的示例,并强调了相关成本。虽然许多刚接触云计算的用户可能认为,只需上传应用程序并选择实例类型和存储选项,就可以轻松使用云服务,但我们说明了,在应用程序能够充分利用云的巨大资源之前,需要进行大量的前期工作。我们的目的是提供一套最佳实践,并说明这些实践如何适用于生物医学信息学的典型应用程序管道,同时也足够通用,可以推广到其他类型的计算问题。我们的映射示例旨在说明如何开发可扩展的项目,而不是比较和对比读映射和基因组组装的对齐算法。实际上,使用更新的对齐器(如 Bowtie),可以在 48 小时内使用一个 m2.2xlarge 实例映射整个非洲基因组,总计算成本约为 48 美元。在我们的示例中,我们不关心数据传输速率,因为数据传输速率受到可用带宽、连接延迟和网络可用性的影响。在将大量数据传输到云时,带宽限制可能是一个主要瓶颈,在某些情况下,将包含数据的存储设备直接邮寄到 AWS(http://aws.amazon.com/importexport/)更为高效。有关云计算的更多信息、详细的成本分析和安全性可以在参考资料中找到。