通过计算机语言分析实现语言群体间偏见的自动化检测

Automating the Detection of Linguistic Intergroup Bias Through Computerized Language Analysis.

作者信息

Collins Katherine A, Boyd Ryan L

机构信息

University of Saskatchewan, Saskatoon, SK, Canada.

University of Texas at Dallas, Richardson, TX, USA.

出版信息

J Lang Soc Psychol. 2025 Feb 24;44(3-4):343-366. doi: 10.1177/0261927X251318887. eCollection 2025 Jun-Sep.

DOI:10.1177/0261927X251318887

PMID:40291764

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12027610/

Abstract

Linguistic bias is the differential use of abstraction, or other linguistic mechanisms, for the same behavior by members of different groups. Abstraction is defined by the Linguistic Category Model (LCM), which defines a continuum of words from concrete to abstract. Linguistic Intergroup Bias (LIB) characterizes the tendency for people to use abstract words for undesirable outgroup and desirable ingroup behavior and concrete words for desirable outgroup and undesirable ingroup behavior. Thus, by examining abstraction in a text, we can understand the implicit attitudes of the author. Yet, research is currently stifled by the time-consuming and resource-intensive method of manual coding. In this study, we aim to develop an automated method to code for LIB. We compiled various techniques, including forms of sentence tokenization, sentiment analysis, and abstraction coding. All methods provided scores that were a good approximation of manually coded scores, which is promising and suggests that more complex methods for LIB coding may be unnecessary. We recommend automated approaches using CoreNLP sentiment analysis and LCM Dictionary abstraction coding.

摘要

语言偏见是指不同群体成员对同一行为在抽象或其他语言机制的使用上存在差异。抽象由语言类别模型（LCM）定义，该模型定义了一个从具体到抽象的词汇连续体。语言群体间偏见（LIB）的特点是人们倾向于用抽象词汇描述不受欢迎的外群体行为和受欢迎的内群体行为，而用具体词汇描述受欢迎的外群体行为和不受欢迎的内群体行为。因此，通过检查文本中的抽象程度，我们可以理解作者的隐含态度。然而，目前的研究因手动编码这种耗时且资源密集的方法而受阻。在本研究中，我们旨在开发一种用于对LIB进行编码的自动化方法。我们整合了各种技术，包括句子分词形式、情感分析和抽象编码。所有方法提供的分数都与手动编码分数非常接近，这很有前景，表明可能无需更复杂的LIB编码方法。我们推荐使用CoreNLP情感分析和LCM词典抽象编码的自动化方法。