用于创建和评估隐私保护的生物医学预测模型的综合工具。

A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models.

机构信息

School of Medicine, Technical University of Munich, Ismaninger Str. 22, Munich, 81675, Germany.

Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, Berlin, 10178, Germany.

出版信息

BMC Med Inform Decis Mak. 2020 Feb 11;20(1):29. doi: 10.1186/s12911-020-1041-3.

DOI:10.1186/s12911-020-1041-3

PMID:32046701

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7014648/

Abstract

BACKGROUND

Modern data driven medical research promises to provide new insights into the development and course of disease and to enable novel methods of clinical decision support. To realize this, machine learning models can be trained to make predictions from clinical, paraclinical and biomolecular data. In this process, privacy protection and regulatory requirements need careful consideration, as the resulting models may leak sensitive personal information. To counter this threat, a wide range of methods for integrating machine learning with formal methods of privacy protection have been proposed. However, there is a significant lack of practical tools to create and evaluate such privacy-preserving models. In this software article, we report on our ongoing efforts to bridge this gap.

RESULTS

We have extended the well-known ARX anonymization tool for biomedical data with machine learning techniques to support the creation of privacy-preserving prediction models. Our methods are particularly well suited for applications in biomedicine, as they preserve the truthfulness of data (e.g. no noise is added) and they are intuitive and relatively easy to explain to non-experts. Moreover, our implementation is highly versatile, as it supports binomial and multinomial target variables, different types of prediction models and a wide range of privacy protection techniques. All methods have been integrated into a sound framework that supports the creation, evaluation and refinement of models through intuitive graphical user interfaces. To demonstrate the broad applicability of our solution, we present three case studies in which we created and evaluated different types of privacy-preserving prediction models for breast cancer diagnosis, diagnosis of acute inflammation of the urinary system and prediction of the contraceptive method used by women. In this process, we also used a wide range of different privacy models (k-anonymity, differential privacy and a game-theoretic approach) as well as different data transformation techniques.

CONCLUSIONS

With the tool presented in this article, accurate prediction models can be created that preserve the privacy of individuals represented in the training set in a variety of threat scenarios. Our implementation is available as open source software.

摘要

背景

现代数据驱动的医学研究有望为疾病的发展和进程提供新的见解，并为临床决策支持提供新的方法。为了实现这一目标，可以训练机器学习模型从临床、辅助临床和生物分子数据中进行预测。在这个过程中，需要仔细考虑隐私保护和监管要求，因为由此产生的模型可能会泄露敏感的个人信息。为了应对这一威胁，已经提出了广泛的将机器学习与隐私保护的形式化方法相结合的方法。然而，创建和评估此类隐私保护模型的实用工具却严重缺乏。在本软件文章中，我们报告了正在努力弥合这一差距。

结果

我们使用机器学习技术扩展了著名的 ARX 生物医学数据匿名化工具，以支持创建隐私保护的预测模型。我们的方法特别适用于生物医学应用，因为它们保留了数据的真实性（例如，不会添加噪声），并且易于解释，非专家也容易理解。此外，我们的实现非常灵活，因为它支持二项式和多项式目标变量、不同类型的预测模型以及广泛的隐私保护技术。所有方法都已集成到一个健全的框架中，该框架通过直观的图形用户界面支持模型的创建、评估和改进。为了展示我们解决方案的广泛适用性，我们提出了三个案例研究，其中我们创建和评估了不同类型的隐私保护预测模型，用于乳腺癌诊断、泌尿系统急性炎症诊断和女性避孕方法预测。在此过程中，我们还使用了广泛的不同隐私模型（k-匿名、差分隐私和博弈论方法）以及不同的数据转换技术。