Avery Paul
Department of Physics, University of Florida, 2029 NPB, PO Box 118440, Gainesville, FL 32611, USA.
Philos Trans A Math Phys Eng Sci. 2002 Jun 15;360(1795):1191-209. doi: 10.1098/rsta.2002.0988.
Twenty-first-century scientific and engineering enterprises are increasingly characterized by their geographic dispersion and their reliance on large data archives. These characteristics bring with them unique challenges. First, the increasing size and complexity of modern data collections require significant investments in information technologies to store, retrieve and analyse them. Second, the increased distribution of people and resources in these projects has made resource sharing and collaboration across significant geographic and organizational boundaries critical to their success. In this paper I explore how computing infrastructures based on Data Grids offer data-intensive enterprises a comprehensive, scalable framework for collaboration and resource sharing. A detailed example of a Data Grid framework is presented for a Large Hadron Collider experiment, where a hierarchical set of laboratory and university resources comprising petaflops of processing power and a multi-petabyte data archive must be efficiently used by a global collaboration. The experience gained with these new information systems, providing transparent managed access to massive distributed data collections, will be applicable to large-scale, data-intensive problems in a wide spectrum of scientific and engineering disciplines, and eventually in industry and commerce. Such systems will be needed in the coming decades as a central element of our information-based society.
21世纪的科学与工程企业越来越具有地域分散和依赖大型数据档案库的特点。这些特点带来了独特的挑战。首先,现代数据集合规模的不断扩大和复杂性的不断增加,需要在信息技术方面进行大量投资,以便存储、检索和分析这些数据。其次,这些项目中人员和资源分布的增加,使得跨越重大地理和组织界限的资源共享与协作对其成功至关重要。在本文中,我探讨了基于数据网格的计算基础设施如何为数据密集型企业提供一个全面、可扩展的协作与资源共享框架。针对大型强子对撞机实验给出了一个数据网格框架的详细示例,在该实验中,一个由千万亿次计算能力和多拍字节数据档案库组成的实验室和大学资源层次集合,必须由一个全球合作团队高效利用。通过这些新信息系统获得的经验,即提供对海量分布式数据集合的透明管理访问,将适用于广泛的科学和工程学科,最终适用于工商业中的大规模数据密集型问题。在未来几十年里,作为我们信息社会的核心要素,将需要这样的系统。