Institute of Computer Science, Johannes Gutenberg University, Mainz, Germany.
Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.
Stud Health Technol Inform. 2024 Aug 30;317:261-269. doi: 10.3233/SHTI240866.
Retrieving comprehensible rule-based knowledge from medical data by machine learning is a beneficial task, e.g., for automating the process of creating a decision support system. While this has recently been studied by means of exception-tolerant hierarchical knowledge bases (i.e., knowledge bases, where rule-based knowledge is represented on several levels of abstraction), privacy concerns have not been addressed extensively in this context yet. However, privacy plays an important role, especially for medical applications.
When parts of the original dataset can be restored from a learned knowledge base, there may be a practically and legally relevant risk of re-identification for individuals. In this paper, we study privacy issues of exception-tolerant hierarchical knowledge bases which are learned from data. We propose approaches for determining and eliminating privacy issues of the learned knowledge bases.
We present results for synthetic as well as for real world datasets.
The results show that our approach effectively prevents privacy breaches while only moderately decreasing the inference quality.
通过机器学习从医学数据中检索可理解的基于规则的知识是一项有益的任务,例如,用于自动化创建决策支持系统的过程。虽然最近已经通过容错分层知识库(即,在多个抽象级别上表示基于规则的知识的知识库)来研究了这一点,但在这种情况下,隐私问题尚未得到广泛解决。然而,隐私对于医疗应用程序尤其重要。
当可以从学习的知识库中恢复原始数据集的一部分时,对于个人而言,可能存在实际和法律上相关的重新识别风险。在本文中,我们研究了从数据中学习的容错分层知识库的隐私问题。我们提出了确定和消除学习知识库中的隐私问题的方法。
我们给出了合成数据集和真实世界数据集的结果。
结果表明,我们的方法在有效防止隐私泄露的同时,仅适度降低了推理质量。