Information Services Platform Laboratory, Universal Communication Research Institute, National Institute of Information and Communications Technology , Kyoto, Japan.
Big Data. 2014 Mar;2(1):23-33. doi: 10.1089/big.2013.0046. Epub 2014 Feb 19.
The digital universe is exponentially producing an unprecedented volume of data that has brought benefits as well as fundamental challenges for enterprises and scientific communities alike. This trend is inherently exciting for the development and deployment of cloud platforms to support scientific communities curating big data. The excitement stems from the fact that scientists can now access and extract value from the big data corpus, establish relationships between bits and pieces of information from many types of data, and collaborate with a diverse community of researchers from various domains. However, despite these perceived benefits, to date, little attention is focused on the people or communities who are both beneficiaries and, at the same time, producers of big data. The technical challenges posed by big data are as big as understanding the dynamics of communities working with big data, whether scientific or otherwise. Furthermore, the big data era also means that big data platforms for data-intensive research must be designed in such a way that research scientists can easily search and find data for their research, upload and download datasets for onsite/offsite use, perform computations and analysis, share their findings and research experience, and seamlessly collaborate with their colleagues. In this article, we present the architecture and design of a cloud platform that meets some of these requirements, and a big data curation model that describes how a community of earth and environmental scientists is using the platform to curate data. Motivation for developing the platform, lessons learnt in overcoming some challenges associated with supporting scientists to curate big data, and future research directions are also presented.
数字宇宙呈指数级增长,产生了前所未有的大量数据,这给企业和科学界都带来了好处和根本性的挑战。这种趋势对于开发和部署云平台以支持管理大数据的科学界来说,具有内在的吸引力。这种兴奋源于这样一个事实,即科学家现在可以访问和提取大数据语料库中的价值,建立来自多种类型数据的信息片段之间的关系,并与来自不同领域的多样化研究人员社区进行合作。然而,尽管有这些预期的好处,但迄今为止,人们很少关注既是大数据受益者,同时又是大数据生产者的个人或社区。大数据带来的技术挑战与理解使用大数据的社区的动态一样大,无论是科学领域还是其他领域。此外,大数据时代还意味着,用于数据密集型研究的大数据平台必须设计成研究科学家可以轻松搜索和查找研究数据、上传和下载数据集以供现场/场外使用、进行计算和分析、共享他们的发现和研究经验,并与同事无缝协作的方式。在本文中,我们介绍了满足这些要求的云平台架构和设计,以及大数据管理模型,描述了地球和环境科学家社区如何使用该平台来管理数据。还介绍了开发该平台的动机、克服支持科学家管理大数据所面临的一些挑战的经验教训,以及未来的研究方向。