大数据，小个体：算法如何塑造数据驱动的用户细分群体的人口统计代表性。

Big Data, Small Personas: How Algorithms Shape the Demographic Representation of Data-Driven User Segments.

机构信息

School of Marketing and Communication, University of Vaasa, Finland.

Fulda University of Applied Sciences, Fulda, Germany.

出版信息

Big Data. 2022 Aug;10(4):313-336. doi: 10.1089/big.2021.0177.

DOI:10.1089/big.2021.0177

PMID:35969694

Abstract

Derived from the notion of algorithmic bias, it is possible that creating user segments such as personas from data results in over- or under-representing certain segments (FAIRNESS), does not properly represent the diversity of the user populations (DIVERSITY), or produces inconsistent results when hyperparameters are changed (CONSISTENCY). Collecting user data on 363M video views from a global news and media organization, we compare personas created from this data using different algorithms. Results indicate that the algorithms fall into two groups: those that generate personas with and those that generate personas with . The algorithms that rank high on diversity tend to rank low on fairness (Spearman's correlation: -0.83). The algorithm that best balances diversity, fairness, and consistency is Spectral Embedding. The results imply that the choice of algorithm is a crucial step in data-driven user segmentation, because the algorithm fundamentally impacts the demographic attributes of the generated personas and thus influences how decision makers view the user population. The results have implications for algorithmic bias in user segmentation and creating user segments that not only consider commercial segmentation criteria but also consider criteria derived from ethical discussions in the computing community.

摘要

从算法偏差的概念出发，从数据中创建用户细分（如人物角色）可能会导致某些细分过度或不足（公平性），不能正确代表用户群体的多样性（多样性），或者在超参数改变时产生不一致的结果（一致性）。我们从一家全球新闻和媒体机构收集了 3.63 亿条视频浏览量的用户数据，并比较了使用不同算法创建的人物角色。结果表明，这些算法分为两类：一类生成的人物角色具有较高的多样性，另一类生成的人物角色具有较高的公平性。在多样性方面得分较高的算法在公平性方面的得分往往较低（斯皮尔曼相关系数：-0.83）。在多样性、公平性和一致性方面表现最好的算法是谱嵌入。结果表明，算法的选择是数据驱动的用户细分中的一个关键步骤，因为算法从根本上影响了生成人物角色的人口统计学属性，从而影响决策者如何看待用户群体。这些结果对用户细分中的算法偏差以及创建不仅考虑商业细分标准，还考虑计算社区中伦理讨论得出的标准的用户细分具有启示意义。

相似文献

Big Data, Small Personas: How Algorithms Shape the Demographic Representation of Data-Driven User Segments.大数据，小个体：算法如何塑造数据驱动的用户细分群体的人口统计代表性。

Big Data. 2022 Aug;10(4):313-336. doi: 10.1089/big.2021.0177.

Generating user-driven patient personas to support preventive health care activities of rural-living unattached patients.生成用户驱动的患者角色，以支持农村独居患者的预防保健活动。

PEC Innov. 2024 Mar 13;4:100274. doi: 10.1016/j.pecinn.2024.100274. eCollection 2024 Dec.

User Personas for a "Better Design" of Nation-Wide EHRs Based on Thorough Expert Evaluation and Field Analysis: Modeling Users as Individuals Plus Family Members for an Enhanced Mapping of Healthcare Situations.基于深入的专家评估和现场分析的“更好设计”全国性电子健康记录的用户角色：将用户建模为个人加家庭成员，以增强对医疗保健情况的映射。

Stud Health Technol Inform. 2024 Apr 26;313:87-92. doi: 10.3233/SHTI240017.

User profiles and personas in the design and development of consumer health technologies.用户画像和角色在消费者健康技术的设计和开发中的应用。

Int J Med Inform. 2013 Nov;82(11):e251-68. doi: 10.1016/j.ijmedinf.2011.03.006. Epub 2011 Apr 9.

A joint fairness model with applications to risk predictions for underrepresented populations.具有代表性不足人群风险预测应用的联合公平模型。

Biometrics. 2023 Jun;79(2):826-840. doi: 10.1111/biom.13632. Epub 2022 Mar 27.

FairRankVis: A Visual Analytics Framework for Exploring Algorithmic Fairness in Graph Mining Models.FairRankVis：用于探索图挖掘模型中算法公平性的可视化分析框架。

IEEE Trans Vis Comput Graph. 2022 Jan;28(1):368-377. doi: 10.1109/TVCG.2021.3114850. Epub 2021 Dec 24.

Survey-based personas for a target-group-specific consideration of elderly end users of information and communication systems in the German health-care sector.基于调查的人物角色，以特定目标群体的方式考虑德国医疗保健领域中信息和通信系统的老年最终用户。

Int J Med Inform. 2019 Dec;132:103924. doi: 10.1016/j.ijmedinf.2019.07.003. Epub 2019 Aug 11.

Personas: stepping into the shoes of the library user.

Med Ref Serv Q. 2013;32(4):443-50. doi: 10.1080/02763869.2013.837737.

Perception of fairness in algorithmic decisions: Future developers' perspective.算法决策中的公平感：未来开发者的视角。

Patterns (N Y). 2021 Nov 3;3(1):100380. doi: 10.1016/j.patter.2021.100380. eCollection 2022 Jan 14.

Creating personas for exposome research: the experience from the HEAP project.为暴露组研究创建人物角色：HEAP项目的经验

Open Res Eur. 2023 Feb 7;3:28. doi: 10.12688/openreseurope.15474.1. eCollection 2023.

引用本文的文献

Client Perspectives of Case Stories in Internet-Delivered Cognitive Behavioral Therapy for Public Safety Personnel: Mixed Methods Study.面向公共安全人员的互联网认知行为疗法案例故事的客户观点：混合方法研究。

JMIR Form Res. 2024 Oct 25;8:e64454. doi: 10.2196/64454.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

大数据，小个体：算法如何塑造数据驱动的用户细分群体的人口统计代表性。

Big Data, Small Personas: How Algorithms Shape the Demographic Representation of Data-Driven User Segments.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献