Suppr超能文献

20世纪90年代大规模数据处理的挑战:综合公共使用微观数据系列(IPUMS)的经验

Challenges of Large-Scale Data Processing in the 1990s: The IPUMS Experience.

作者信息

Magnuson Diana L, Ruggles Steven

机构信息

Institute for Social Research and Data Innovation, University of Minnesota, Minneapolis, MN, 55455, USA.

出版信息

IEEE Ann Hist Comput. 2022 Oct-Dec;44(4):71-83. doi: 10.1109/mahc.2022.3214736.

Abstract

When it was launched in 1991, the Integrated Public Use Microdata Series (IPUMS) project faced a challenging environment and limited resources. Few datasets were interoperable and much data collected at great public expense was inaccessible to most researchers. Documentation of datasets was nonstandardized, incomplete, and inadequate for automated processing. With insufficient attention to preservation, valuable scientific data were disappearing (see Bogue et al., 1976). IPUMS was established to address these critical issues. At the outset, IPUMS faced daunting barriers of inadequate data processing, storage, and network capacity. This anecdote describes the improvised computational infrastructure developed in the decade from 1989 to 1999 to process, manage, and disseminate the world's largest population datasets. We use a combination of archival sources, interviews, and our own memories to trace the development of the IPUMS computing environment during a period of explosive technical innovation. The development of IPUMS is part of a larger story of the development of social science infrastructure in the late 20th century and its contribution to democratizing data access.

摘要

1991年启动时,综合公共使用微观数据系列(IPUMS)项目面临着充满挑战的环境和有限的资源。很少有数据集是可互操作的,而且大多数研究人员无法获取许多以高昂公共成本收集的数据。数据集的文档是非标准化的、不完整的,并且不足以进行自动化处理。由于对保存不够重视,宝贵的科学数据正在消失(见博格等人,1976年)。IPUMS的设立就是为了解决这些关键问题。一开始,IPUMS面临着数据处理、存储和网络能力不足的艰巨障碍。这个轶事描述了1989年至1999年这十年间为处理、管理和传播世界上最大的人口数据集而临时搭建的计算基础设施。我们结合档案资料、访谈以及我们自己的记忆,来追溯IPUMS计算环境在技术创新爆发时期的发展历程。IPUMS的发展是20世纪后期社会科学基础设施发展这一更大故事的一部分,以及它对数据获取民主化的贡献。

相似文献

本文引用的文献

1
The Revival of Quantification: Reflections on Old New Histories.量化的复兴:对新旧历史的反思
Soc Sci Hist. 2021 Spring;45(1):1-25. doi: 10.1017/ssh.2020.44. Epub 2021 Jan 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验