Magnuson Diana L, Ruggles Steven
Institute for Social Research and Data Innovation, University of Minnesota, Minneapolis, MN, 55455, USA.
IEEE Ann Hist Comput. 2022 Oct-Dec;44(4):71-83. doi: 10.1109/mahc.2022.3214736.
When it was launched in 1991, the Integrated Public Use Microdata Series (IPUMS) project faced a challenging environment and limited resources. Few datasets were interoperable and much data collected at great public expense was inaccessible to most researchers. Documentation of datasets was nonstandardized, incomplete, and inadequate for automated processing. With insufficient attention to preservation, valuable scientific data were disappearing (see Bogue et al., 1976). IPUMS was established to address these critical issues. At the outset, IPUMS faced daunting barriers of inadequate data processing, storage, and network capacity. This anecdote describes the improvised computational infrastructure developed in the decade from 1989 to 1999 to process, manage, and disseminate the world's largest population datasets. We use a combination of archival sources, interviews, and our own memories to trace the development of the IPUMS computing environment during a period of explosive technical innovation. The development of IPUMS is part of a larger story of the development of social science infrastructure in the late 20th century and its contribution to democratizing data access.
1991年启动时,综合公共使用微观数据系列(IPUMS)项目面临着充满挑战的环境和有限的资源。很少有数据集是可互操作的,而且大多数研究人员无法获取许多以高昂公共成本收集的数据。数据集的文档是非标准化的、不完整的,并且不足以进行自动化处理。由于对保存不够重视,宝贵的科学数据正在消失(见博格等人,1976年)。IPUMS的设立就是为了解决这些关键问题。一开始,IPUMS面临着数据处理、存储和网络能力不足的艰巨障碍。这个轶事描述了1989年至1999年这十年间为处理、管理和传播世界上最大的人口数据集而临时搭建的计算基础设施。我们结合档案资料、访谈以及我们自己的记忆,来追溯IPUMS计算环境在技术创新爆发时期的发展历程。IPUMS的发展是20世纪后期社会科学基础设施发展这一更大故事的一部分,以及它对数据获取民主化的贡献。