通过考虑为符合通用数据保护条例（GDPR）而给予的同意的结构化表示，实现数据集的“即时”生成。

"Just-in-time" generation of datasets by considering structured representations of given consent for GDPR compliance.

作者信息

Debruyne Christophe, Pandit Harshvardhan J, Lewis Dave, O'Sullivan Declan

机构信息

ADAPT Centre, Trinity College Dublin, Dublin 2, Ireland.

出版信息

Knowl Inf Syst. 2020;62(9):3615-3640. doi: 10.1007/s10115-020-01468-x. Epub 2020 Apr 15.

DOI:10.1007/s10115-020-01468-x

PMID:32647404

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7327958/

Abstract

Data processing is increasingly becoming the subject of various policies and regulations, such as the European General Data Protection Regulation (GDPR) that came into effect in May 2018. One important aspect of GDPR is informed consent, which captures one's permission for using one's personal information for specific data processing purposes. Organizations must demonstrate that they comply with these policies. The fines that come with non-compliance are of such importance that it has driven research in facilitating compliance verification. The state-of-the-art primarily focuses on, for instance, the analysis of prescriptive models and posthoc analysis on logs to check whether data processing is compliant to GDPR. We argue that GDPR compliance can be facilitated by ensuring datasets used in processing activities are compliant with consent from the very start. The problem addressed in this paper is how we can generate datasets that comply with given consent "just-in-time". We propose RDF and OWL ontologies to represent the consent that an organization has collected and its relationship with data processing purposes. We use this ontology to annotate schemas, allowing us to generate declarative mappings that transform (relational) data into RDF driven by the annotations. We furthermore demonstrate how we can create compliant datasets by altering the results of the mapping. The use of RDF and OWL allows us to implement the entire process in a declarative manner using SPARQL. We have integrated all components in a service that furthermore captures provenance information for each step, further contributing to the transparency that is needed towards facilitating compliance verification. We demonstrate the approach with a synthetic dataset simulating users (re-)giving, withdrawing, and rejecting their consent on data processing purposes of systems. In summary, it is argued that the approach facilitates transparency and compliance verification from the start, reducing the need for posthoc compliance analysis common in the state-of-the-art.

摘要

数据处理日益成为各种政策法规的主题，例如2018年5月生效的欧盟《通用数据保护条例》（GDPR）。GDPR的一个重要方面是知情同意，即获取个人允许将其个人信息用于特定数据处理目的的许可。组织必须证明他们遵守这些政策。不遵守规定所带来的罚款非常严重，这推动了促进合规性验证的研究。目前的技术主要集中在，例如，分析规范性模型以及对日志进行事后分析，以检查数据处理是否符合GDPR。我们认为，通过确保处理活动中使用的数据集从一开始就符合同意要求，可以促进GDPR合规。本文解决的问题是如何“及时”生成符合给定同意要求的数据集。我们提出使用RDF和OWL本体来表示组织收集的同意及其与数据处理目的的关系。我们使用此本体对模式进行注释，从而能够生成声明性映射，将（关系型）数据转换为由注释驱动的RDF。我们还展示了如何通过改变映射结果来创建合规数据集。使用RDF和OWL使我们能够使用SPARQL以声明性方式实现整个过程。我们已将所有组件集成到一个服务中，该服务还捕获每个步骤的溯源信息，进一步提高了促进合规性验证所需的透明度。我们用一个模拟用户对系统数据处理目的（重新）给予、撤回和拒绝同意的合成数据集演示了该方法。总之，有人认为该方法从一开始就促进了透明度和合规性验证，减少了现有技术中常见的事后合规性分析的需求。