Department of Management and Information Systems, Kent State University, Kent, Ohio, USA.
Department of Management and Information Systems, Kent State University at Tuscarawas, New Philadelphia, Ohio, USA.
Big Data. 2024;12(3):213-228. doi: 10.1089/big.2022.0182. Epub 2023 Aug 14.
When users interact with their mobile devices, they leave behind unique digital footprints that can be viewed as predictive proxies that reveal an array of users' characteristics, including their demographics. Predicting users' demographics based on mobile usage can provide significant benefits for service providers and users, including improving customer targeting, service personalization, and market research efforts. This study uses machine learning algorithms and mobile usage data from 235 demographically diverse users to examine the accuracy of predicting their sociodemographic attributes (age, gender, income, and education) from mobile usage metadata, filling the gap in the current literature by quantifying the predictive power of each attribute and discussing the practical applications and privacy implications. According to the results, gender can be most accurately predicted (balanced accuracy = 0.862) from mobile usage footprints, whereas predicting users' education level is more challenging (balanced accuracy = 0.719). Moreover, the classification models were able to classify users based on whether their age or income was above or below a certain threshold with acceptable accuracy. The study also presents the practical applications of inferring demographic attributes from mobile usage data and discusses the implications of the findings, such as privacy and discrimination risks, from the perspectives of different stakeholders.
当用户与他们的移动设备交互时,他们会留下独特的数字足迹,可以将其视为预测代理,揭示用户的一系列特征,包括人口统计学特征。基于移动使用情况预测用户的人口统计学特征可以为服务提供商和用户提供重要的好处,包括改善客户定位、服务个性化和市场研究工作。本研究使用机器学习算法和来自 235 名具有不同人口统计学特征的用户的移动使用数据,检查从移动使用元数据预测其社会人口统计学属性(年龄、性别、收入和教育程度)的准确性,通过量化每个属性的预测能力并讨论实际应用和隐私影响来填补当前文献中的空白。根据结果,性别可以从移动使用足迹中最准确地预测(平衡准确性=0.862),而预测用户的教育程度则更具挑战性(平衡准确性=0.719)。此外,分类模型能够根据用户的年龄或收入是否高于或低于某个阈值以可接受的准确性对用户进行分类。该研究还介绍了从移动使用数据推断人口统计学属性的实际应用,并从不同利益相关者的角度讨论了研究结果的影响,例如隐私和歧视风险。