Francis Paul, Jurak Gregor, Leskošek Bojan, Otte Karen, Prasser Fabian
MPI-SWS, Kaiserslautern, 67663, Germany.
University of Ljubljana, Faculty of Sports, Ljubljana, 1000, Slovenia.
Sci Data. 2025 Sep 18;12(1):1548. doi: 10.1038/s41597-025-05823-x.
One of many challenges to open science is anonymization of personal data so that it may be shared. This paper presents a case study of the anonymization of a dataset containing cardio-respiratory fitness and commuting patterns for Slovenian school children. It evaluates three different anonymization tools, ARX, SDV, and SynDiffix. The fitness study was selected because its small size (N=713) and generally low statistical significance make it particularly challenging for data anonymization. Unlike most prior anonymization tool evaluations, this paper examines whether the scientific conclusions of the original study would have been supported by the anonymized datasets. It also considers the burden imposed on researchers using the tools both for data generation and data analysis.
开放科学面临的众多挑战之一是对个人数据进行匿名化处理,以便能够共享这些数据。本文展示了一个数据集匿名化的案例研究,该数据集包含斯洛文尼亚学童的心肺适能和通勤模式。它评估了三种不同的匿名化工具,即ARX、SDV和SynDiffix。之所以选择这项健身研究,是因为其样本量小(N = 713)且统计显著性普遍较低,这使其在数据匿名化方面特别具有挑战性。与大多数先前的匿名化工具评估不同,本文考察了原始研究的科学结论是否会得到匿名化数据集的支持。它还考虑了使用这些工具进行数据生成和数据分析给研究人员带来的负担。