• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用机器学习对已去除保护健康信息的大型国家体力活动数据集进行重新识别个体的可行性。

Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets From Which Protected Health Information Has Been Removed With Use of Machine Learning.

机构信息

Operations Research Center, Massachusetts Institute of Technology, Cambridge.

Department of Industrial Engineering and Operations Research, University of California, Berkeley.

出版信息

JAMA Netw Open. 2018 Dec 7;1(8):e186040. doi: 10.1001/jamanetworkopen.2018.6040.

DOI:10.1001/jamanetworkopen.2018.6040
PMID:30646312
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6324329/
Abstract

IMPORTANCE

Despite data aggregation and removal of protected health information, there is concern that deidentified physical activity (PA) data collected from wearable devices can be reidentified. Organizations collecting or distributing such data suggest that the aforementioned measures are sufficient to ensure privacy. However, no studies, to our knowledge, have been published that demonstrate the possibility or impossibility of reidentifying such activity data.

OBJECTIVE

To evaluate the feasibility of reidentifying accelerometer-measured PA data, which have had geographic and protected health information removed, using support vector machines (SVMs) and random forest methods from machine learning.

DESIGN, SETTING, AND PARTICIPANTS: In this cross-sectional study, the National Health and Nutrition Examination Survey (NHANES) 2003-2004 and 2005-2006 data sets were analyzed in 2018. The accelerometer-measured PA data were collected in a free-living setting for 7 continuous days. NHANES uses a multistage probability sampling design to select a sample that is representative of the civilian noninstitutionalized household (both adult and children) population of the United States.

EXPOSURES

The NHANES data sets contain objectively measured movement intensity as recorded by accelerometers worn during all walking for 1 week.

MAIN OUTCOMES AND MEASURES

The primary outcome was the ability of the random forest and linear SVM algorithms to match demographic and 20-minute aggregated PA data to individual-specific record numbers, and the percentage of correct matches by each machine learning algorithm was the measure.

RESULTS

A total of 4720 adults (mean [SD] age, 40.0 [20.6] years) and 2427 children (mean [SD] age, 12.3 [3.4] years) in NHANES 2003-2004 and 4765 adults (mean [SD] age, 45.2 [19.9] years) and 2539 children (mean [SD] age, 12.1 [3.4] years) in NHANES 2005-2006 were included in the study. The random forest algorithm successfully reidentified the demographic and 20-minute aggregated PA data of 4478 adults (94.9%) and 2120 children (87.4%) in NHANES 2003-2004 and 4470 adults (93.8%) and 2172 children (85.5%) in NHANES 2005-2006 (P < .001 for all). The linear SVM algorithm successfully reidentified the demographic and 20-minute aggregated PA data of 4043 adults (85.6%) and 1695 children (69.8%) in NHANES 2003-2004 and 4041 adults (84.8%) and 1705 children (67.2%) in NHANES 2005-2006 (P < .001 for all).

CONCLUSIONS AND RELEVANCE

This study suggests that current practices for deidentification of accelerometer-measured PA data might be insufficient to ensure privacy. This finding has important policy implications because it appears to show the need for deidentification that aggregates the PA data of multiple individuals to ensure privacy for single individuals.

摘要

重要性

尽管数据聚合和保护健康信息的删除,仍有人担心从可穿戴设备收集的去识别的身体活动 (PA) 数据可能被重新识别。收集或分发此类数据的组织建议,上述措施足以确保隐私。然而,据我们所知,没有研究表明这种活动数据重新识别的可能性或不可能。

目的

使用机器学习中的支持向量机 (SVM) 和随机森林方法,评估从地理和保护健康信息中删除的加速度计测量的 PA 数据重新识别的可行性。

设计、设置和参与者:在这项横断面研究中,分析了 2018 年的国家健康和营养检查调查 (NHANES) 2003-2004 年和 2005-2006 年数据集。加速度计测量的 PA 数据是在自由生活环境中连续 7 天收集的。NHANES 使用多阶段概率抽样设计来选择代表美国非机构化家庭(包括成人和儿童)人口的样本。

暴露情况

NHANES 数据集包含通过佩戴在所有行走期间记录的加速度计客观测量的运动强度。

主要结果和措施

主要结果是随机森林和线性 SVM 算法将人口统计学和 20 分钟聚合的 PA 数据与个体特定记录编号相匹配的能力,每个机器学习算法的正确匹配百分比是衡量标准。

结果

在 NHANES 2003-2004 年中,共有 4720 名成年人(平均[SD]年龄,40.0[20.6]岁)和 2427 名儿童(平均[SD]年龄,12.3[3.4]岁),以及 NHANES 2005-2006 年中的 4765 名成年人(平均[SD]年龄,45.2[19.9]岁)和 2539 名儿童(平均[SD]年龄,12.1[3.4]岁)被纳入研究。随机森林算法成功地重新识别了 NHANES 2003-2004 年中 4478 名成年人(94.9%)和 2120 名儿童(87.4%)的人口统计学和 20 分钟聚合的 PA 数据,以及 NHANES 2005-2006 年中 4470 名成年人(93.8%)和 2172 名儿童(85.5%)(所有 P<.001)。线性 SVM 算法成功地重新识别了 NHANES 2003-2004 年中 4043 名成年人(85.6%)和 1695 名儿童(69.8%)的人口统计学和 20 分钟聚合的 PA 数据,以及 NHANES 2005-2006 年中 4041 名成年人(84.8%)和 1705 名儿童(67.2%)(所有 P<.001)。

结论和相关性

本研究表明,当前用于去识别加速度计测量的 PA 数据的做法可能不足以确保隐私。这一发现具有重要的政策意义,因为它似乎表明需要去识别将多个个体的 PA 数据聚合在一起,以确保单个个体的隐私。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1b6/6324329/6db0b95aedcb/jamanetwopen-1-e186040-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1b6/6324329/ca1900ec6e00/jamanetwopen-1-e186040-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1b6/6324329/6db0b95aedcb/jamanetwopen-1-e186040-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1b6/6324329/ca1900ec6e00/jamanetwopen-1-e186040-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1b6/6324329/6db0b95aedcb/jamanetwopen-1-e186040-g002.jpg

相似文献

1
Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets From Which Protected Health Information Has Been Removed With Use of Machine Learning.利用机器学习对已去除保护健康信息的大型国家体力活动数据集进行重新识别个体的可行性。
JAMA Netw Open. 2018 Dec 7;1(8):e186040. doi: 10.1001/jamanetworkopen.2018.6040.
2
Cross-validation and out-of-sample testing of physical activity intensity predictions with a wrist-worn accelerometer.腕戴加速度计的体力活动强度预测的交叉验证和样本外测试。
J Appl Physiol (1985). 2018 May 1;124(5):1284-1293. doi: 10.1152/japplphysiol.00760.2017. Epub 2018 Jan 25.
3
Associations Between Physical Activity and Metabolic Syndrome: Comparison Between Self-Report and Accelerometry.身体活动与代谢综合征之间的关联:自我报告与加速度计测量法的比较
Am J Health Promot. 2016 Jan-Feb;30(3):155-62. doi: 10.4278/ajhp.121127-QUAN-576. Epub 2015 Mar 25.
4
Machine learning algorithms for activity recognition in ambulant children and adolescents with cerebral palsy.机器学习算法在脑瘫病童和青少年活动识别中的应用。
J Neuroeng Rehabil. 2018 Nov 15;15(1):105. doi: 10.1186/s12984-018-0456-x.
5
A comparison of accelerometry analysis methods for physical activity in older adult women and associations with health outcomes over time.比较老年人女性体力活动的加速度分析方法及其与随时间变化的健康结果的关联。
J Sports Sci. 2019 Oct;37(20):2309-2317. doi: 10.1080/02640414.2019.1631080. Epub 2019 Jun 14.
6
Neighborhood Context and Youth Physical Activity: Differential Associations by Gender and Age.邻里环境与青少年身体活动:按性别和年龄的差异关联
Am J Health Promot. 2017 Sep;31(5):426-434. doi: 10.1177/0890117116667353. Epub 2016 Sep 16.
7
Association of Selective Serotonin Reuptake Inhibitor Use With Abnormal Physical Movement Patterns as Detected Using a Piezoelectric Accelerometer and Deep Learning in a Nationally Representative Sample of Noninstitutionalized Persons in the US.使用压电加速度计和深度学习在美国全国代表性的非机构化人群样本中检测到选择性 5-羟色胺再摄取抑制剂的使用与异常身体运动模式的关联。
JAMA Netw Open. 2022 Apr 1;5(4):e225403. doi: 10.1001/jamanetworkopen.2022.5403.
8
Combined Associations of Muscle-Strengthening Activities and Accelerometer-Assessed Physical Activity on Multimorbidity: Findings From NHANES.肌肉强化活动与加速度计评估的身体活动对多种疾病的联合关联:来自美国国家健康与营养检查调查(NHANES)的结果
Am J Health Promot. 2017 Jul;31(4):274-277. doi: 10.4278/ajhp.150520-QUAN-894. Epub 2016 Jan 5.
9
Improving energy expenditure estimates from wearable devices: A machine learning approach.利用机器学习方法提高可穿戴设备的能量消耗估算
J Sports Sci. 2020 Jul;38(13):1496-1505. doi: 10.1080/02640414.2020.1746088. Epub 2020 Apr 6.
10
Health Benefits of Light-Intensity Physical Activity: A Systematic Review of Accelerometer Data of the National Health and Nutrition Examination Survey (NHANES).低强度身体活动的健康益处:对全国健康和营养检查调查(NHANES)加速度计数据的系统评价。
Sports Med. 2017 Sep;47(9):1769-1793. doi: 10.1007/s40279-017-0724-0.

引用本文的文献

1
Transforming Population Health Screening for Atherosclerotic Cardiovascular Disease with AI-Enhanced ECG Analytics: Opportunities and Challenges.利用人工智能增强的心电图分析改变动脉粥样硬化性心血管疾病的人群健康筛查:机遇与挑战。
Curr Atheroscler Rep. 2025 Sep 1;27(1):86. doi: 10.1007/s11883-025-01337-4.
2
Clinical and economic impact of a large language model in perioperative medicine: a randomized crossover trial.大语言模型在围手术期医学中的临床和经济影响:一项随机交叉试验
NPJ Digit Med. 2025 Jul 21;8(1):462. doi: 10.1038/s41746-025-01858-x.
3
Our Theater of Anonymity.

本文引用的文献

1
Ethical implications of location and accelerometer measurement in health research studies with mobile sensing devices.移动感应设备在健康研究中定位和加速度计测量的伦理问题。
Soc Sci Med. 2017 Oct;191:84-88. doi: 10.1016/j.socscimed.2017.08.043. Epub 2017 Sep 13.
2
Ethical Implications of User Perceptions of Wearable Devices.可穿戴设备用户感知的伦理问题
Sci Eng Ethics. 2018 Feb;24(1):1-28. doi: 10.1007/s11948-017-9872-8. Epub 2017 Feb 2.
3
Unintended Consequences of Wearable Sensor Use in Healthcare. Contribution of the IMIA Wearable Sensors in Healthcare WG.
我们的匿名之地。
Ethics Hum Res. 2025 Jul-Aug;47(4):37-42. doi: 10.1002/eahr.60027.
4
Associations of biological aging with the morbidity and all-cause mortality of patients with lung cancer.生物衰老与肺癌患者发病率及全因死亡率的关联。
Sci Rep. 2025 May 23;15(1):17880. doi: 10.1038/s41598-025-00114-2.
5
Advancing the diagnosis of major depressive disorder: Integrating neuroimaging and machine learning.推进重度抑郁症的诊断:整合神经影像学与机器学习
World J Psychiatry. 2025 Mar 19;15(3):103321. doi: 10.5498/wjp.v15.i3.103321.
6
Machine Learning Models With Prognostic Implications for Predicting Gastrointestinal Bleeding After Coronary Artery Bypass Grafting and Guiding Personalized Medicine: Multicenter Cohort Study.具有预测冠状动脉搭桥术后胃肠道出血及指导个性化医疗预后意义的机器学习模型:多中心队列研究
J Med Internet Res. 2025 Mar 6;27:e68509. doi: 10.2196/68509.
7
Artificial Intelligence in Pediatric Electrocardiography: A Comprehensive Review.儿科心电图中的人工智能:全面综述。
Children (Basel). 2024 Dec 27;12(1):25. doi: 10.3390/children12010025.
8
Considerations for Social Networks and Health Data Sharing: An Overview.社交网络与健康数据共享的考量:概述
Ann Epidemiol. 2025 Feb;102:28-35. doi: 10.1016/j.annepidem.2024.12.014. Epub 2024 Dec 30.
9
HydraGAN: A Cooperative Agent Model for Multi-Objective Data Generation.九头蛇生成对抗网络(HydraGAN):一种用于多目标数据生成的协作代理模型。
ACM Trans Intell Syst Technol. 2024 Jun;15(3). doi: 10.1145/3653982. Epub 2024 May 17.
10
Responsible application of artificial intelligence in health care.人工智能在医疗保健中的合理应用。
S Afr J Sci. 2023 May-Jun;119(5-6). doi: 10.17159/sajs.2023/14889. Epub 2023 May 30.
可穿戴传感器在医疗保健中的意外后果。国际医学信息学协会医疗保健可穿戴传感器工作组的贡献。
Yearb Med Inform. 2016 Nov 10(1):73-86. doi: 10.15265/IY-2016-025.
4
The Fitbit Fault Line: Two Proposals to Protect Health and Fitness Data at Work.Fitbit故障线:两项保护工作场所健康与健身数据的提议。
Yale J Health Policy Law Ethics. 2016 Winter;16(1):1-49.
5
Privacy in the digital world: medical and health data outside of HIPAA protections.数字世界中的隐私:不受《健康保险流通与责任法案》保护的医疗和健康数据。
Curr Psychiatry Rep. 2014 Nov;16(11):494. doi: 10.1007/s11920-014-0494-4.
6
Identifying personal genomes by surname inference.姓氏推断识别个人基因组。
Science. 2013 Jan 18;339(6117):321-4. doi: 10.1126/science.1229566.
7
Physical activity in U.S.: adults compliance with the Physical Activity Guidelines for Americans.美国的身体活动:成年人对《美国人身体活动指南》的遵守情况。
Am J Prev Med. 2011 Apr;40(4):454-61. doi: 10.1016/j.amepre.2010.12.016.
8
Physical activity in the United States measured by accelerometer.在美国,通过加速度计测量身体活动。
Med Sci Sports Exerc. 2008 Jan;40(1):181-8. doi: 10.1249/mss.0b013e31815a51b3.
9
The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies.《加强流行病学观察性研究报告(STROBE)声明》:观察性研究报告指南
PLoS Med. 2007 Oct 16;4(10):e296. doi: 10.1371/journal.pmed.0040296.