• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从网络遥测数据预测年龄和性别:对隐私的影响和对政策的影响。

Predicting age and gender from network telemetry: Implications for privacy and impact on policy.

机构信息

Department of Computer Science, Illinois Institute of Technology, Chicago, IL, United States of America.

Department of Social Sciences, Illinois Institute of Technology, Chicago, IL, United States of America.

出版信息

PLoS One. 2022 Jul 21;17(7):e0271714. doi: 10.1371/journal.pone.0271714. eCollection 2022.

DOI:10.1371/journal.pone.0271714
PMID:35862447
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9302812/
Abstract

The systematic monitoring of private communications through the use of information technology pervades the digital age. One result of this is the potential availability of vast amount of data tracking the characteristics of mobile network users. Such data is becoming increasingly accessible for commercial use, while the accessibility of such data raises questions about the degree to which personal information can be protected. Existing regulations may require the removal of personally-identifiable information (PII) from datasets before they can be processed, but research now suggests that powerful machine learning classification methods are capable of targeting individuals for personalized marketing purposes, even in the absence of PII. This study aims to demonstrate how machine learning methods can be deployed to extract demographic characteristics. Specifically, we investigate whether key demographics-gender and age-of mobile users can be accurately identified by third parties using deep learning techniques based solely on observations of the user's interactions within the network. Using an anonymized dataset from a Latin American country, we show the relative ease by which PII in terms of the age and gender demographics can be inferred; specifically, our neural networks model generates an estimate for gender with an accuracy rate of 67%, outperforming decision tree, random forest, and gradient boosting models by a significant margin. Neural networks achieve an even higher accuracy rate of 78% in predicting the subscriber age. These results suggest the need for a more robust regulatory framework governing the collection of personal data to safeguard users from predatory practices motivated by fraudulent intentions, prejudices, or consumer manipulation. We discuss in particular how advances in machine learning have chiseled away a number of General Data Protection Regulation (GDPR) articles designed to protect consumers from the imminent threat of privacy violations.

摘要

通过信息技术对私人通信进行系统监控在数字时代已经无处不在。其结果之一是,跟踪移动网络用户特征的大量数据越来越容易获取。这些数据越来越容易被商业利用,而这些数据的可获取性引发了人们对于个人信息可以在多大程度上得到保护的疑问。现有法规可能要求在对数据集进行处理之前,将个人身份信息(PII)从数据集中删除,但研究表明,即使没有 PII,强大的机器学习分类方法也能够针对个人进行个性化营销。本研究旨在展示机器学习方法如何被用于提取人口统计学特征。具体来说,我们调查了第三方是否可以仅通过观察用户在网络中的交互,使用基于深度学习的技术准确识别移动用户的性别和年龄等关键人口统计信息。我们使用来自拉丁美洲国家的匿名数据集,展示了通过观察用户在网络中的交互来推断 PII(即年龄和性别人口统计信息)的相对容易程度;具体来说,我们的神经网络模型生成的性别估计准确率为 67%,比决策树、随机森林和梯度提升模型的准确率有显著提高。神经网络在预测用户年龄方面的准确率甚至更高,达到了 78%。这些结果表明,需要更强大的监管框架来规范个人数据的收集,以保护用户免受欺诈意图、偏见或消费者操纵等动机的掠夺性做法的侵害。我们特别讨论了机器学习的进步如何削弱了一些旨在保护消费者免受隐私侵犯威胁的通用数据保护条例(GDPR)条款。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/89b087f203fd/pone.0271714.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/2fcdeccb0e5f/pone.0271714.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/562cc070466a/pone.0271714.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/2ea1b0686f74/pone.0271714.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/4eb00fa15807/pone.0271714.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/4ab2bbe42b9e/pone.0271714.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/0e96635d3f75/pone.0271714.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/89b087f203fd/pone.0271714.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/2fcdeccb0e5f/pone.0271714.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/562cc070466a/pone.0271714.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/2ea1b0686f74/pone.0271714.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/4eb00fa15807/pone.0271714.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/4ab2bbe42b9e/pone.0271714.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/0e96635d3f75/pone.0271714.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50e/9302812/89b087f203fd/pone.0271714.g007.jpg

相似文献

1
Predicting age and gender from network telemetry: Implications for privacy and impact on policy.从网络遥测数据预测年龄和性别:对隐私的影响和对政策的影响。
PLoS One. 2022 Jul 21;17(7):e0271714. doi: 10.1371/journal.pone.0271714. eCollection 2022.
2
IoT Privacy Risks Revealed.物联网隐私风险暴露。
Entropy (Basel). 2024 Jun 29;26(7):561. doi: 10.3390/e26070561.
3
Learning From Others Without Sacrificing Privacy: Simulation Comparing Centralized and Federated Machine Learning on Mobile Health Data.从他人身上学习而不牺牲隐私:移动健康数据集中式和联邦机器学习的模拟比较。
JMIR Mhealth Uhealth. 2021 Mar 30;9(3):e23728. doi: 10.2196/23728.
4
Federated personalized random forest for human activity recognition.联邦个性化随机森林的人体活动识别。
Math Biosci Eng. 2022 Jan;19(1):953-971. doi: 10.3934/mbe.2022044. Epub 2021 Nov 22.
5
Predicting Sociodemographic Attributes from Mobile Usage Patterns: Applications and Privacy Implications.从移动使用模式预测社会人口属性:应用和隐私影响。
Big Data. 2024;12(3):213-228. doi: 10.1089/big.2022.0182. Epub 2023 Aug 14.
6
PrivaTree: Collaborative Privacy-Preserving Training of Decision Trees on Biomedical Data.PrivaTree:在生物医学数据上协同进行隐私保护的决策树训练。
IEEE/ACM Trans Comput Biol Bioinform. 2024 Jan-Feb;21(1):1-13. doi: 10.1109/TCBB.2023.3286274. Epub 2024 Feb 5.
7
Personal Health Information Inference Using Machine Learning on RNA Expression Data from Patients With Cancer: Algorithm Validation Study.利用癌症患者 RNA 表达数据进行机器学习的个人健康信息推断:算法验证研究。
J Med Internet Res. 2020 Aug 10;22(8):e18387. doi: 10.2196/18387.
8
How private is your mental health app data? An empirical study of mental health app privacy policies and practices.你的心理健康应用数据有多隐私?一项关于心理健康应用隐私政策和实践的实证研究。
Int J Law Psychiatry. 2019 May-Jun;64:198-204. doi: 10.1016/j.ijlp.2019.04.002. Epub 2019 Apr 28.
9
Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.基于数据驱动的血糖动力学建模与预测:机器学习在 1 型糖尿病中的应用。
Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26.
10
Rainbow: reliable personally identifiable information retrieval across multi-cloud.Rainbow:跨多云环境的可靠个人身份信息检索
Cybersecur (Singap). 2023;6(1):19. doi: 10.1186/s42400-023-00146-z. Epub 2023 Jun 3.

本文引用的文献

1
Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries.社会数据:偏见、方法陷阱与伦理边界
Front Big Data. 2019 Jul 11;2:13. doi: 10.3389/fdata.2019.00013. eCollection 2019.
2
The ethics of sensor technology use in clinical research.传感器技术在临床研究中的使用伦理。
Nurs Outlook. 2020 Nov-Dec;68(6):720-726. doi: 10.1016/j.outlook.2020.04.011. Epub 2020 Jul 2.
3
Effectiveness of isolation, testing, contact tracing, and physical distancing on reducing transmission of SARS-CoV-2 in different settings: a mathematical modelling study.
隔离、检测、接触者追踪和保持社交距离在不同环境下减少 SARS-CoV-2 传播的效果:一项数学建模研究。
Lancet Infect Dis. 2020 Oct;20(10):1151-1160. doi: 10.1016/S1473-3099(20)30457-6. Epub 2020 Jun 16.
4
The Value and Ethics of Using Technology to Contain the COVID-19 Epidemic.利用技术遏制新冠疫情的价值与伦理
Am J Bioeth. 2020 Jul;20(7):W7-W11. doi: 10.1080/15265161.2020.1764136. Epub 2020 May 18.
5
A Survey of the Usages of Deep Learning for Natural Language Processing.深度学习在自然语言处理中的应用调查。
IEEE Trans Neural Netw Learn Syst. 2021 Feb;32(2):604-624. doi: 10.1109/TNNLS.2020.2979670. Epub 2021 Feb 4.
6
Evaluation of the Effectiveness of Surveillance and Containment Measures for the First 100 Patients with COVID-19 in Singapore - January 2-February 29, 2020.评估新加坡前 100 名 COVID-19 患者的监测和控制措施的有效性 - 2020 年 1 月 2 日至 2 月 29 日。
MMWR Morb Mortal Wkly Rep. 2020 Mar 20;69(11):307-311. doi: 10.15585/mmwr.mm6911e1.
7
Towards Responsible Implementation of Monitoring Technologies in Institutional Care.迈向机构关怀中监测技术的负责任应用。
Gerontologist. 2020 Sep 15;60(7):1194-1201. doi: 10.1093/geront/gnz190.
8
A Systematic Comparison of Age and Gender Prediction on IMU Sensor-Based Gait Traces.基于 IMU 传感器的步态轨迹的年龄和性别预测的系统比较
Sensors (Basel). 2019 Jul 4;19(13):2945. doi: 10.3390/s19132945.
9
Research ethics for mobile sensing device use by vulnerable populations.弱势群体使用移动感应设备的研究伦理。
Soc Sci Med. 2019 Jul;232:50-57. doi: 10.1016/j.socscimed.2019.04.035. Epub 2019 Apr 25.
10
Prediction Policy Problems.预测政策问题。
Am Econ Rev. 2015 May;105(5):491-495. doi: 10.1257/aer.p20151023.