Fayyoumi Ebaa, Oommen B John
School of Computer Science, Carleton University, Ottawa, ON, Canada.
IEEE Trans Syst Man Cybern B Cybern. 2009 Oct;39(5):1192-205. doi: 10.1109/TSMCB.2009.2013723. Epub 2009 Mar 24.
We consider the microaggregation problem (MAP) that involves partitioning a set of individual records in a microdata file into a number of mutually exclusive and exhaustive groups. This problem, which seeks for the best partition of the microdata file, is known to be NP-hard and has been tackled using many heuristic solutions. In this paper, we present the first reported fixed-structure-stochastic-automata-based solution to this problem. The newly proposed method leads to a lower value of the information loss (IL), obtains a better tradeoff between the IL and the disclosure risk (DR) when compared with state-of-the-art methods, and leads to a superior value of the scoring index, which is a criterion involving a combination of the IL and the DR. The scheme has been implemented, tested, and evaluated for different real-life and simulated data sets. The results clearly demonstrate the applicability of learning automata to the MAP and its ability to yield a solution that obtains the best tradeoff between IL and DR when compared with the state of the art.
我们考虑微聚集问题(MAP),该问题涉及将微数据文件中的一组个体记录划分为若干相互排斥且详尽无遗的组。这个寻求微数据文件最佳划分的问题已知是NP难问题,并且已经使用许多启发式解决方案来处理。在本文中,我们提出了首个基于固定结构随机自动机的该问题解决方案。新提出的方法导致信息损失(IL)值更低,与现有方法相比,在IL和披露风险(DR)之间获得了更好的权衡,并且导致评分指数具有更高的值,评分指数是一个涉及IL和DR组合的标准。该方案已针对不同的实际和模拟数据集进行了实现、测试和评估。结果清楚地证明了学习自动机在MAP中的适用性,以及与现有技术相比,其产生在IL和DR之间获得最佳权衡的解决方案的能力。