Suppr超能文献

基于教育程度登记的微观模拟预测未来的记录链接质量。

Microsimulation of an educational attainment register to predict future record linkage quality.

机构信息

Research Methodology Group, University of Duisburg-Essen, 47057 Duisburg, Germany.

出版信息

Int J Popul Data Sci. 2023 Apr 3;8(1):2122. doi: 10.23889/ijpds.v8i1.2122. eCollection 2023.

Abstract

INTRODUCTION

Population wide educational attainment registers are necessary for educational planning and research. Regular linking of databases is needed to build and update such a register. Without availability of unique national identification numbers, record linkage must be based on quasi-identifiers such as name, date of birth and sex. However, the data protection principle of data minimization aims to minimize the set of identifiers in databases.

OBJECTIVES

Therefore, the German Federal Ministry of Research and Education commissioned a study to inform legislation on the minimum set of identifiers required for a national educational register.

METHODS

To justify our recommendations empirically, we implemented a microsimulation of about 20 million people. The simulated register accumulates changes and errors in identifiers due to migration, regional mobility, marriage, school career and mortality, thereby allowing the study of errors on longitudinal datasets. Updated records were linked yearly to the simulated register using several linkage methods. Clear-text methods as well as privacy-preserving (PPRL) methods were compared.

RESULTS

The results indicate linkage bias if only the primary identifiers are available in the register. More detailed identifiers, including place of birth, are required to minimize linkage bias. The amount of information available to identify a person for matching is more critical for linkage quality than the record linkage method applied. Differences in linkage quality between the best procedures (probabilistic linkage and multiple matchkeys) are minor.

CONCLUSIONS

Microsimulation is a valuable tool for designing record linkage procedures. By modelling the processes resulting in changes or errors in quasi-identifiers, predicting data quality to be expected after the implementation of a register seems possible.

摘要

简介

人口教育程度登记册对于教育规划和研究是必要的。为了建立和更新这样一个登记册,需要定期链接数据库。如果没有可用的唯一国家识别号码,记录链接必须基于准标识符,如姓名、出生日期和性别。然而,数据保护原则的数据最小化旨在最小化数据库中的标识符集。

目的

因此,德国联邦研究与教育部委托进行了一项研究,为国家教育登记册所需的最小标识符集提供立法依据。

方法

为了从经验上证明我们的建议是合理的,我们对大约 2000 万人进行了微模拟。模拟登记册会因迁移、地区流动、婚姻、学业和死亡而累积标识符的变化和错误,从而可以研究纵向数据集上的错误。使用几种链接方法,每年将更新的记录链接到模拟登记册。我们比较了明文方法和隐私保护(PPRL)方法。

结果

结果表明,如果登记册中只有主要标识符,则存在链接偏差。需要更详细的标识符,包括出生地,以最小化链接偏差。用于匹配的标识一个人的信息量比应用的记录链接方法更关键,对链接质量有影响。最佳程序(概率链接和多个匹配键)之间的链接质量差异较小。

结论

微模拟是设计记录链接程序的有用工具。通过模拟导致准标识符发生变化或错误的过程,似乎可以预测在实施登记册后预期的数据质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17a6/10463005/bde63be94f98/ijpds-08-2122-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验