Suppr超能文献

通过基于约束的聚类对医疗研究进行人口普查区层面的社会经济数据去识别化。

De-identifying Socioeconomic Data at the Census Tract Level for Medical Research Through Constraint-based Clustering.

机构信息

Vanderbilt University, Nashville, TN.

Vanderbilt University Medical Center, Nashville, TN.

出版信息

AMIA Annu Symp Proc. 2022 Feb 21;2021:793-802. eCollection 2021.

Abstract

Numerous studies have shown that a person's health status is closely related to their socioeconomic status. It is evident that incorporating socioeconomic data associated with a patient's geographic area of residence into clinical datasets will promote medical research. However, most socioeconomic variables are unique in combination and are affiliated with small geographical regions (e.g., census tracts) that are often associated with less than 20,000 people. Thus, sharing such tract-level data can violate the Safe Harbor implementation of de-identification under the Health Insurance Portability and Accountability Act of 1996 (HIPAA). In this paper, we introduce a constraint-based k-means clustering approach to generate census tract-level socioeconomic data that is de-identification compliant. Our experimental analysis with data from the American Community Survey illustrates that the approach generates a protected dataset with high similarity to the unaltered values, and achieves a substantially better data utility than the HIPAA Safe Harbor recommendation of 3-digit ZIP code.

摘要

大量研究表明,一个人的健康状况与其社会经济地位密切相关。显然,将与患者居住地理区域相关的社会经济数据纳入临床数据集将促进医学研究。然而,大多数社会经济变量在组合上是独特的,并且与小的地理区域(例如,人口普查区)相关联,这些区域通常与不到 20000 人相关联。因此,共享此类区域级数据可能会违反 1996 年《健康保险携带和责任法案》(HIPAA)的安全港实施的去识别。在本文中,我们介绍了一种基于约束的 k-均值聚类方法来生成符合去识别要求的人口普查区社会经济数据。我们使用美国社区调查数据进行的实验分析表明,该方法生成的受保护数据集与原始值高度相似,并且比 HIPAA 安全港建议的 3 位邮政编码具有更高的数据实用性。

相似文献

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验