Suppr超能文献

《健康保险流通与责任法案》安全港数据中的重新识别风险:一项对来自一项环境卫生研究数据的研究

Re-identification Risks in HIPAA Safe Harbor Data: A study of data from one environmental health study.

作者信息

Sweeney Latanya, Yoo Ji Su, Perovich Laura, Boronow Katherine E, Brown Phil, Brody Julia Green

机构信息

Harvard University, Cambridge, MA.

MIT Media Lab, Cambridge, MA.

出版信息

Technol Sci. 2017;2017. Epub 2017 Aug 28.

Abstract

Researchers are increasingly asked to share research data as part of publication and funding processes and to maximize the benefits of publicly funded research. The Safe Harbor provision of the U.S. Health Information Portability and Accountability Act (HIPAA) offers guidance to researchers by prescribing how to redact data for public sharing. For example, the provision requires removing explicit identifiers (such as name, address and other personally identifiable information), reporting dates in years, and reducing some or all digits of a postal (or ZIP) code. Is this sufficient? Can research participants still be re-identified in research data that adhere to the HIPAA Safe Harbor standard? In 2006, researchers collected air and dust samples and interviewed residents of 50 homes from Bolinas and Richmond (Atchison Village and Liberty Village), California, to analyze the residents' exposure to pollutants. The study, known as the Northern California Household Exposure Study [1], led to publications that have been cited hundreds of times. We conducted experiments with separate "attacker" and "scorer" teams to see whether we could identify study participants from two versions of the data redacted beyond the HIPAA standard, one in which all dates were reported in ranges of 10 or 20 years and another in which a study participant's birth year was reported exactly. The attackers were blinded to the names and addresses of the participants, and the scorers were blinded to the strategy.

摘要

作为发表论文和申请资金流程的一部分,研究人员越来越多地被要求分享研究数据,以实现公共资助研究效益的最大化。美国《健康保险流通与责任法案》(HIPAA)中的“安全港”条款为研究人员提供了指导,规定了如何对用于公开共享的数据进行编辑。例如,该条款要求去除明确的标识符(如姓名、地址和其他个人身份信息),按年份报告日期,并减少邮政编码的部分或所有数字。这就足够了吗?在遵循HIPAA安全港标准的研究数据中,研究参与者是否仍能被重新识别?2006年,研究人员收集了空气和灰尘样本,并采访了加利福尼亚州博利纳斯和里士满(阿奇森村和自由村)50户家庭的居民,以分析居民接触污染物的情况。这项名为“北加州家庭接触研究”[1]的研究发表的论文已被引用数百次。我们分别组建了“攻击者”和“评分者”团队进行实验,看看能否从两个版本的超出HIPAA标准编辑的数据中识别出研究参与者,一个版本是所有日期按10年或20年的范围报告,另一个版本是准确报告研究参与者的出生年份。攻击者不知道参与者的姓名和地址,评分者也不知道策略。

相似文献

3
Evaluation of Privacy Risks of Patients' Data in China: Case Study.
JMIR Med Inform. 2020 Feb 5;8(2):e13046. doi: 10.2196/13046.
6
Re-Identification Risk in HIPAA De-Identified Datasets: The MVA Attack.
AMIA Annu Symp Proc. 2018 Dec 5;2018:1329-1337. eCollection 2018.
8
Creation of clinical research databases in the 21st century: a practical algorithm for HIPAA Compliance.
Surg Infect (Larchmt). 2006 Feb;7(1):37-44. doi: 10.1089/sur.2006.7.37.
9
A software tool for removing patient identifying information from clinical documents.
J Am Med Inform Assoc. 2008 Sep-Oct;15(5):601-10. doi: 10.1197/jamia.M2702. Epub 2008 Jun 25.
10
Household interventions for secondary prevention of domestic lead exposure in children.
Cochrane Database Syst Rev. 2020 Oct 6;10(10):CD006047. doi: 10.1002/14651858.CD006047.pub6.

引用本文的文献

1
FedscGen: privacy-preserving federated batch effect correction of single-cell RNA sequencing data.
Genome Biol. 2025 Jul 22;26(1):216. doi: 10.1186/s13059-025-03684-6.
2
Privacy violations in election results.
Sci Adv. 2025 Mar 14;11(11):eadt1512. doi: 10.1126/sciadv.adt1512. Epub 2025 Mar 12.
3
Privacy preserving strategies for electronic health records in the era of large language models.
NPJ Digit Med. 2025 Jan 16;8(1):34. doi: 10.1038/s41746-025-01429-0.
4
Ethical, Legal, and Social Implications of Gene-Environment Interaction Research.
Genet Epidemiol. 2025 Jan;49(1):e22591. doi: 10.1002/gepi.22591. Epub 2024 Sep 24.
8
Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets.
JCO Clin Cancer Inform. 2023 Sep;7:e2300116. doi: 10.1200/CCI.23.00116.
9
A method for generating synthetic longitudinal health data.
BMC Med Res Methodol. 2023 Mar 23;23(1):67. doi: 10.1186/s12874-023-01869-w.
10
Managing re-identification risks while providing access to the All of Us research program.
J Am Med Inform Assoc. 2023 Apr 19;30(5):907-914. doi: 10.1093/jamia/ocad021.

本文引用的文献

1
The Legal Implications of Report Back in Household Exposure Studies.
Environ Health Perspect. 2016 Nov;124(11):1662-1670. doi: 10.1289/EHP187. Epub 2016 May 6.
2
After the PBDE phase-out: a broad suite of flame retardants in repeat house dust samples from California.
Environ Sci Technol. 2012 Dec 18;46(24):13056-66. doi: 10.1021/es303879n. Epub 2012 Nov 28.
3
Reflexive Research Ethics for Environmental Health and Justice: Academics and Movement-Building.
Soc Mov Stud. 2012;11(2):161-176. doi: 10.1080/14742837.2012.664898. Epub 2012 Apr 2.
4
A systematic review of re-identification attacks on health data.
PLoS One. 2011;6(12):e28071. doi: 10.1371/journal.pone.0028071. Epub 2011 Dec 2.
6
Toxics Use Reduction in the Home: Lessons Learned from Household Exposure Studies.
J Clean Prod. 2011 Mar 1;19(5):438-444. doi: 10.1016/j.jclepro.2010.06.012.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验