Aristodimou Aristos, Antoniades Athos, Pattichis Constantinos S
Department of Computer Science , University of Cyprus , Nicosia , Cyprus.
Healthc Technol Lett. 2016 Mar 23;3(1):16-21. doi: 10.1049/htl.2015.0050. eCollection 2016 Mar.
In healthcare, there is a vast amount of patients' data, which can lead to important discoveries if combined. Due to legal and ethical issues, such data cannot be shared and hence such information is underused. A new area of research has emerged, called privacy preserving data publishing (PPDP), which aims in sharing data in a way that privacy is preserved while the information lost is kept at a minimum. In this Letter, a new anonymisation algorithm for PPDP is proposed, which is based on k-anonymity through pattern-based multidimensional suppression (kPB-MS). The algorithm uses feature selection for reducing the data dimensionality and then combines attribute and record suppression for obtaining k-anonymity. Five datasets from different areas of life sciences [RETINOPATHY, Single Proton Emission Computed Tomography imaging, gene sequencing and drug discovery (two datasets)], were anonymised with kPB-MS. The produced anonymised datasets were evaluated using four different classifiers and in 74% of the test cases, they produced similar or better accuracies than using the full datasets.
在医疗保健领域,存在大量患者数据,如果将这些数据结合起来,可能会带来重要发现。由于法律和伦理问题,此类数据无法共享,因此这些信息未得到充分利用。一个新的研究领域出现了,称为隐私保护数据发布(PPDP),其目的是以一种在保护隐私的同时将信息损失降至最低的方式共享数据。在本信函中,提出了一种用于PPDP的新匿名化算法,该算法基于通过基于模式的多维抑制(kPB-MS)实现的k匿名性。该算法使用特征选择来降低数据维度,然后结合属性和记录抑制来实现k匿名性。使用kPB-MS对来自生命科学不同领域的五个数据集[视网膜病变、单光子发射计算机断层扫描成像、基因测序和药物发现(两个数据集)]进行了匿名化处理。使用四种不同的分类器对生成的匿名数据集进行了评估,在74%的测试案例中,它们产生的准确率与使用完整数据集时相似或更高。