Kohlmayer Florian, Prasser Fabian, Eckert Claudia, Kuhn Klaus A
Technische Universität München, University Medical Center (MRI), Ismaninger Strasse 22, 81675 München, Germany.
Technische Universität München, University Medical Center (MRI), Ismaninger Strasse 22, 81675 München, Germany.
J Biomed Inform. 2014 Aug;50:62-76. doi: 10.1016/j.jbi.2013.12.002. Epub 2013 Dec 12.
Sensitive biomedical data is often collected from distributed sources, involving different information systems and different organizational units. Local autonomy and legal reasons lead to the need of privacy preserving integration concepts. In this article, we focus on anonymization, which plays an important role for the re-use of clinical data and for the sharing of research data. We present a flexible solution for anonymizing distributed data in the semi-honest model. Prior to the anonymization procedure, an encrypted global view of the dataset is constructed by means of a secure multi-party computing (SMC) protocol. This global representation can then be anonymized. Our approach is not limited to specific anonymization algorithms but provides pre- and postprocessing for a broad spectrum of algorithms and many privacy criteria. We present an extensive analytical and experimental evaluation and discuss which types of methods and criteria are supported. Our prototype demonstrates the approach by implementing k-anonymity, ℓ-diversity, t-closeness and δ-presence with a globally optimal de-identification method in horizontally and vertically distributed setups. The experiments show that our method provides highly competitive performance and offers a practical and flexible solution for anonymizing distributed biomedical datasets.
敏感的生物医学数据通常从分布式来源收集,涉及不同的信息系统和不同的组织单位。地方自主性和法律原因导致需要隐私保护集成概念。在本文中,我们重点关注匿名化,它在临床数据的再利用和研究数据的共享中起着重要作用。我们提出了一种在半诚实模型中对分布式数据进行匿名化的灵活解决方案。在匿名化过程之前,通过安全多方计算(SMC)协议构建数据集的加密全局视图。然后可以对这种全局表示进行匿名化。我们的方法不限于特定的匿名化算法,而是为广泛的算法和许多隐私标准提供预处理和后处理。我们进行了广泛的分析和实验评估,并讨论了支持哪些类型的方法和标准。我们的原型通过在水平和垂直分布式设置中使用全局最优去识别方法实现k匿名、l多样性、t接近度和δ存在性,展示了该方法。实验表明,我们的方法具有极具竞争力的性能,并为分布式生物医学数据集的匿名化提供了实用且灵活的解决方案。