Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany.
BMC Med Inform Decis Mak. 2019 Sep 4;19(1):178. doi: 10.1186/s12911-019-0905-x.
The collection of data and biospecimens which characterize patients and probands in-depth is a core element of modern biomedical research. Relevant data must be considered highly sensitive and it needs to be protected from unauthorized use and re-identification. In this context, laws, regulations, guidelines and best-practices often recommend or mandate pseudonymization, which means that directly identifying data of subjects (e.g. names and addresses) is stored separately from data which is primarily needed for scientific analyses.
When (authorized) re-identification of subjects is not an exceptional but a common procedure, e.g. due to longitudinal data collection, implementing pseudonymization can significantly increase the complexity of software solutions. For example, data stored in distributed databases, need to be dynamically combined with each other, which requires additional interfaces for communicating between the various subsystems. This increased complexity may lead to new attack vectors for intruders. Obviously, this is in contrast to the objective of improving data protection. What is lacking is a standardized process of evaluating and reporting risks, threats and countermeasures, which can be used to test whether integrating pseudonymization methods into data collection systems actually improves upon the degree of protection provided by system designs that simply follow common IT security best practices and implement fine-grained role-based access control models. To demonstrate that the methods used to describe systems employing pseudonymized data management are currently heterogeneous and ad-hoc, we examined the extent to which twelve recent studies address each of the six basic security properties defined by the International Organization for Standardization (ISO) standard 27,000. We show inconsistencies across the studies, with most of them failing to mention one or more security properties.
We discuss the degree of privacy protection provided by implementing pseudonymization into research data collection processes. We conclude that (1) more research is needed on the interplay of pseudonymity, information security and data protection, (2) problem-specific guidelines for evaluating and reporting risks, threats and countermeasures should be developed and that (3) future work on pseudonymized research data collection should include the results of such structured and integrated analyses.
深入收集和生物样本数据,以描绘患者和个体,是现代生物医学研究的核心要素。相关数据必须被视为高度敏感信息,并受到保护,防止未经授权的使用和重新识别。在这种情况下,法律、法规、准则和最佳实践通常建议或要求进行化名处理,这意味着直接识别受试者的数据(例如姓名和地址)与主要用于科学分析的数据分开存储。
当(授权)重新识别受试者不是例外情况,而是常见程序时,例如由于纵向数据收集,实施化名处理会显著增加软件解决方案的复杂性。例如,存储在分布式数据库中的数据需要彼此动态组合,这需要为各个子系统之间的通信提供额外接口。这种增加的复杂性可能会为入侵者带来新的攻击向量。显然,这与提高数据保护的目标背道而驰。所缺乏的是评估和报告风险、威胁和对策的标准化流程,这些流程可用于测试将化名处理方法集成到数据收集系统中是否实际上提高了仅遵循常见 IT 安全最佳实践并实施细粒度基于角色的访问控制模型的系统设计所提供的保护程度。为了证明用于描述使用化名数据管理的系统的方法目前是异构的和临时的,我们检查了十二项最近的研究在多大程度上解决了国际标准化组织(ISO)标准 27,000 定义的六个基本安全属性中的每一个。我们发现研究之间存在不一致,其中大多数研究没有提到一个或多个安全属性。
我们讨论了在研究数据收集过程中实施化名处理提供的隐私保护程度。我们的结论是:(1)需要更多关于化名、信息安全和数据保护相互作用的研究;(2)应制定针对风险、威胁和对策评估和报告的特定问题指南;(3)未来关于化名研究数据收集的工作应包括此类结构化和集成分析的结果。