Gao Qingyu, Feng Dezheng William
Department of English and Communication, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China.
PLoS One. 2025 Jan 9;20(1):e0313932. doi: 10.1371/journal.pone.0313932. eCollection 2025.
This study aims to provide an LLM (Large Language Model)-based method for the discourse analysis of media attitudes, and thereby investigate media attitudes towards China in a Hong Kong-based newspaper. Analysis of attitudes in large amounts of media data is crucial for understanding public opinions, market trends, social dynamics, etc. However, corpus-based approaches have traditionally focused on explicit linguistic expressions of attitudes, leaving implicit expressions unconsidered. To address this gap, the present study explored the possibility of using LLMs for the automated identification and classification of both explicit and implicit attitudes and evaluated the feasibility of implementing this approach on personal computers. The analysis was based on the framework proposed by Martin and White, which provides a structured approach for describing different aspects of media attitudes [1]. Meta's open-source Llama2 (13b) was used for automated attitude analysis and was quantised for deployment on personal computers. The quantised LLM was used to analyse 40,000 expressions about China in a corpus of news reports from Oriental Daily News, a top-selling newspaper in Hong Kong. The results demonstrated that the quantised LLM can accurately capture both explicit and implicit attitudes, with a success rate of approximately 80%, comparable to that of proficient human coders. Challenges encountered during the implementation process and potential coping strategies were also discussed.
本研究旨在提供一种基于大语言模型(LLM)的媒体态度话语分析方法,从而调查一份香港报纸对中国的媒体态度。分析大量媒体数据中的态度对于理解公众舆论、市场趋势、社会动态等至关重要。然而,基于语料库的方法传统上侧重于态度的显性语言表达,而未考虑隐性表达。为了弥补这一差距,本研究探索了使用大语言模型对显性和隐性态度进行自动识别和分类的可能性,并评估了在个人电脑上实施该方法的可行性。该分析基于马丁和怀特提出的框架,该框架为描述媒体态度的不同方面提供了一种结构化方法[1]。Meta的开源Llama2(130亿参数)被用于自动态度分析,并进行了量化以便在个人电脑上部署。量化后的大语言模型被用于分析香港畅销报纸《东方日报》新闻报道语料库中4万条关于中国的表述。结果表明,量化后的大语言模型能够准确捕捉显性和隐性态度,成功率约为80%,与熟练的人工编码员相当。还讨论了实施过程中遇到的挑战及潜在应对策略。