Suppr超能文献

封闭式和开放式词汇方法在文本分析中的应用:综述、定量比较和建议。

Closed- and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations.

机构信息

Department of Psychology, Stanford University.

Melbourne Graduate School of Education, The University of Melbourne.

出版信息

Psychol Methods. 2021 Aug;26(4):398-427. doi: 10.1037/met0000349.

Abstract

Technology now makes it possible to understand efficiently and at large scale how people use language to reveal their everyday thoughts, behaviors, and emotions. Written text has been analyzed through both theory-based, closed-vocabulary methods from the social sciences as well as data-driven, open-vocabulary methods from computer science, but these approaches have not been comprehensively compared. To provide guidance on best practices for automatically analyzing written text, this narrative review and quantitative synthesis compares five predominant closed- and open-vocabulary methods: Linguistic Inquiry and Word Count (LIWC), the General Inquirer, DICTION, Latent Dirichlet Allocation, and Differential Language Analysis. We compare the linguistic features associated with gender, age, and personality across the five methods using an existing dataset of Facebook status updates and self-reported survey data from 65,896 users. Results are fairly consistent across methods. The closed-vocabulary approaches efficiently summarize concepts and are helpful for understanding how people think, with LIWC2015 yielding the strongest, most parsimonious results. Open-vocabulary approaches reveal more specific and concrete patterns across a broad range of content domains, better address ambiguous word senses, and are less prone to misinterpretation, suggesting that they are well-suited for capturing the nuances of everyday psychological processes. We detail several errors that can occur in closed-vocabulary analyses, the impact of sample size, number of words per user and number of topics included in open-vocabulary analyses, and implications of different analytical decisions. We conclude with recommendations for researchers, advocating for a complementary approach that combines closed- and open-vocabulary methods. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

摘要

如今,人们可以利用技术高效、大规模地了解人们如何使用语言来揭示他们的日常想法、行为和情感。已通过社会科学中的基于理论、封闭词汇方法以及计算机科学中的数据驱动、开放词汇方法来分析书面文本,但这些方法尚未得到全面比较。为了为自动分析书面文本提供最佳实践指南,本叙述性评论和定量综合比较了五种主要的封闭和开放词汇方法:语言探究和词汇计数 (LIWC)、一般探究器、DICTION、潜在狄利克雷分配和差异语言分析。我们使用现有的 Facebook 状态更新数据集和 65,896 名用户的自我报告调查数据,比较了这五种方法与性别、年龄和个性相关的语言特征。结果在方法之间相当一致。封闭词汇方法有效地总结了概念,有助于理解人们的思维方式,其中 LIWC2015 产生了最强、最简洁的结果。开放词汇方法在广泛的内容领域中揭示了更具体和具体的模式,更好地解决了模糊的词义问题,并且不太容易被误解,这表明它们非常适合捕捉日常心理过程的细微差别。我们详细介绍了封闭词汇分析中可能出现的几种错误、样本量、每个用户的单词数和开放词汇分析中包含的主题数的影响,以及不同分析决策的影响。最后,我们对研究人员提出了建议,倡导采用封闭和开放词汇方法相结合的互补方法。(PsycInfo 数据库记录 (c) 2021 APA,保留所有权利)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验