School of Marketing and Communication, University of Vaasa, Finland.
Fulda University of Applied Sciences, Fulda, Germany.
Big Data. 2022 Aug;10(4):313-336. doi: 10.1089/big.2021.0177.
Derived from the notion of algorithmic bias, it is possible that creating user segments such as personas from data results in over- or under-representing certain segments (FAIRNESS), does not properly represent the diversity of the user populations (DIVERSITY), or produces inconsistent results when hyperparameters are changed (CONSISTENCY). Collecting user data on 363M video views from a global news and media organization, we compare personas created from this data using different algorithms. Results indicate that the algorithms fall into two groups: those that generate personas with and those that generate personas with . The algorithms that rank high on diversity tend to rank low on fairness (Spearman's correlation: -0.83). The algorithm that best balances diversity, fairness, and consistency is Spectral Embedding. The results imply that the choice of algorithm is a crucial step in data-driven user segmentation, because the algorithm fundamentally impacts the demographic attributes of the generated personas and thus influences how decision makers view the user population. The results have implications for algorithmic bias in user segmentation and creating user segments that not only consider commercial segmentation criteria but also consider criteria derived from ethical discussions in the computing community.
从算法偏差的概念出发,从数据中创建用户细分(如人物角色)可能会导致某些细分过度或不足(公平性),不能正确代表用户群体的多样性(多样性),或者在超参数改变时产生不一致的结果(一致性)。我们从一家全球新闻和媒体机构收集了 3.63 亿条视频浏览量的用户数据,并比较了使用不同算法创建的人物角色。结果表明,这些算法分为两类:一类生成的人物角色具有较高的多样性,另一类生成的人物角色具有较高的公平性。在多样性方面得分较高的算法在公平性方面的得分往往较低(斯皮尔曼相关系数:-0.83)。在多样性、公平性和一致性方面表现最好的算法是谱嵌入。结果表明,算法的选择是数据驱动的用户细分中的一个关键步骤,因为算法从根本上影响了生成人物角色的人口统计学属性,从而影响决策者如何看待用户群体。这些结果对用户细分中的算法偏差以及创建不仅考虑商业细分标准,还考虑计算社区中伦理讨论得出的标准的用户细分具有启示意义。