Suppr超能文献

审视复杂文本中的情感:不同计算方法的比较

Examining Sentiment in Complex Texts. A Comparison of Different Computational Approaches.

作者信息

Munnes Stefan, Harsch Corinna, Knobloch Marcel, Vogel Johannes S, Hipp Lena, Schilling Erik

机构信息

WZB Berlin Social Science Center, Berlin, Germany.

Faculty of Economics and Social Sciences Chair of Inequality Research and Social Stratification Analysis, University of Potsdam, Potsdam, Germany.

出版信息

Front Big Data. 2022 May 4;5:886362. doi: 10.3389/fdata.2022.886362. eCollection 2022.

Abstract

Can we rely on computational methods to accurately analyze complex texts? To answer this question, we compared different dictionary and scaling methods used in predicting the sentiment of German literature reviews to the "gold standard" of human-coded sentiments. Literature reviews constitute a challenging text corpus for computational analysis as they not only contain different text levels-for example, a summary of the work and the reviewer's appraisal-but are also characterized by subtle and ambiguous language elements. To take the nuanced sentiments of literature reviews into account, we worked with a metric rather than a dichotomous scale for sentiment analysis. The results of our analyses show that the predicted sentiments of prefabricated dictionaries, which are computationally efficient and require minimal adaption, have a low to medium correlation with the human-coded sentiments (r between 0.32 and 0.39). The accuracy of self-created dictionaries using word embeddings (both pre-trained and self-trained) was considerably lower (r between 0.10 and 0.28). Given the high coding intensity and contingency on seed selection as well as the degree of data pre-processing of word embeddings that we found with our data, we would not recommend them for complex texts without further adaptation. While fully automated approaches appear not to work in accurately predicting text sentiments with complex texts such as ours, we found relatively high correlations with a semiautomated approach (r of around 0.6)-which, however, requires intensive human coding efforts for the training dataset. In addition to illustrating the benefits and limits of computational approaches in analyzing complex text corpora and the potential of metric rather than binary scales of text sentiment, we also provide a practical guide for researchers to select an appropriate method and degree of pre-processing when working with complex texts.

摘要

我们能否依靠计算方法来准确分析复杂文本?为了回答这个问题,我们将预测德国文学评论情感时使用的不同词典和缩放方法与人工编码情感的“黄金标准”进行了比较。文学评论对于计算分析而言是一个具有挑战性的文本语料库,因为它们不仅包含不同的文本层次——例如,作品的总结和评论者的评价——而且还具有微妙和模糊的语言元素。为了考虑文学评论中细微的情感,我们在情感分析中使用了一种度量标准而非二分法量表。我们的分析结果表明,预制词典的预测情感与人工编码情感的相关性较低至中等(r值介于0.32和0.39之间),预制词典计算效率高且所需调整最少。使用词嵌入(预训练和自训练)创建的自定义词典的准确性要低得多(r值介于0.10和0.28之间)。鉴于我们在数据中发现的词嵌入的高编码强度、对种子选择的依赖性以及数据预处理程度,在没有进一步调整的情况下,我们不建议将它们用于复杂文本。虽然全自动方法似乎无法准确预测像我们这样的复杂文本的情感,但我们发现一种半自动化方法的相关性相对较高(r约为0.6)——然而,这需要对训练数据集进行大量的人工编码工作。除了说明计算方法在分析复杂文本语料库中的优点和局限性以及文本情感度量标准而非二元量表的潜力外,我们还为研究人员在处理复杂文本时选择合适的方法和预处理程度提供了实用指南。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验