School of Science, STEM College, RMIT University, Melbourne, Australia.
Health & Biomedical Research Information Technology Unit (HaBIC R(2)), Department of General Practice, Faculty of Medicine, Dentistry and Health Science, The University of Melbourne, VIC 3010, Australia.
Int J Med Inform. 2022 Nov;167:104859. doi: 10.1016/j.ijmedinf.2022.104859. Epub 2022 Aug 29.
Sharing of health data for secondary uses such as research and public policy development is common. There are many potential benefits, but also risks if information about an individual's health record can be inferred. Studies show cautious willingness amongst the public to share health data for beneficial purposes, as long as they are confident in their data privacy and security. There has been relatively little research into whether the technical guarantees of privacy-preserving technologies are well understood by people asked to consent to sharing their data.
We sought to assess how accurately people understood the effectiveness of techniques for protecting the privacy of shared health data.
We designed an online survey describing a data-sharing scenario motivated by medical research where data could be shared: raw (including identifiers), de-identified (using k-anonymity), aggregated, and differential privacy applied to aggregated data. Respondents were asked about willingness to share their data, and how likely it was that they could be identified. They were also asked for the meaning of 'de-identified' and whether they would agree to sharing information for 'not solely commercial' purposes, thus mirroring the consent language used by Australia's My Health Record system.
Our findings revealed substantial tolerance for researcher use of health data with consistent preference to share data when better privacy-preserving techniques were employed. This was not entirely consistent as slight preference was shown for aggregated data over differential privacy, despite differential privacy being objectively more secure. We conjecture this was because differential privacy and its benefits were not well understood. Similarly, respondents showed no consistent understanding of the term 'de-identified', indicating that this needs to be carefully defined in contexts that seek consent. Finally, many respondents who indicated a willingness to share for purposes that were 'not solely commercial' nevertheless rejected at least some specific scenarios that mixed research and commercial objectives, again indicating a possible gap in their understanding of the terms.
We found overall preference for better privacy protection of data as a precondition for secondary use, but limitations in respondents' understanding of key terminology and the differing privacy guarantees of available techniques. Further effort is needed to word secondary data use consent policies to ensure public understanding of commonly used terms and methods, if genuinely informed consent for data sharing is to be gained.
为了研究和公共政策制定等二次用途而共享健康数据是很常见的。这样做有很多潜在的好处,但如果个人健康记录的信息可以被推断出来,也存在风险。研究表明,只要公众对数据隐私和安全有信心,他们对于出于有益目的共享健康数据的意愿是谨慎的。但是,对于被要求同意共享数据的人是否很好地理解隐私保护技术的技术保障,这方面的研究相对较少。
我们试图评估人们对保护共享健康数据隐私的技术效果的理解的准确性。
我们设计了一个在线调查,描述了一个由医学研究驱动的数据共享场景,在该场景中可以共享数据:原始数据(包括标识符)、去识别(使用 k-匿名)、聚合数据和应用于聚合数据的差分隐私。受访者被问及他们愿意共享自己的数据,以及他们被识别的可能性有多大。他们还被要求解释“去识别”的含义,以及他们是否会同意出于“非纯商业”目的共享信息,从而反映了澳大利亚“我的健康记录”系统使用的同意语言。
我们的发现表明,人们对研究人员使用健康数据的容忍度很大,并且一致倾向于在使用更好的隐私保护技术时共享数据。这并不完全一致,因为尽管差分隐私在客观上更安全,但人们对聚合数据的偏好略高于差分隐私。我们推测这是因为差分隐私及其优势没有被很好地理解。同样,受访者对“去识别”一词没有一致的理解,这表明在寻求同意的情况下需要仔细定义这个术语。最后,许多表示愿意出于“非纯商业”目的共享数据的受访者,尽管他们拒绝了至少一些混合了研究和商业目标的特定场景,但这再次表明他们对这些术语的理解可能存在差距。
我们发现,人们总体上更喜欢更好地保护数据隐私,作为二次使用的前提条件,但受访者对关键术语的理解存在局限性,以及可用技术的隐私保障存在差异。如果要真正获得数据共享的知情同意,需要进一步努力措辞二次数据使用同意政策,以确保公众对常用术语和方法的理解。