Lee Woojin, Lee Jaewook, Kim Harksoo
Department of Artificial Intelligence, Konkuk University, Seoul, Republic of South Korea.
Department of Computer Science and Engineering, Konkuk University, Seoul, Republic of South Korea.
PeerJ Comput Sci. 2024 Dec 3;10:e2585. doi: 10.7717/peerj-cs.2585. eCollection 2024.
Stance detection is a critical task in natural language processing that determines an author's viewpoint toward a specific target, playing a pivotal role in social science research and various applications. Traditional approaches incorporating Wikipedia-sourced data into small language models (SLMs) to compensate for limited target knowledge often suffer from inconsistencies in article quality and length due to the diverse pool of Wikipedia contributors. To address these limitations, we utilize large language models (LLMs) pretrained on expansive datasets to generate accurate and contextually relevant target knowledge. By providing concise, real-world insights tailored to the stance detection task, this approach surpasses the limitations of Wikipedia-based information. Despite their superior reasoning capabilities, LLMs are computationally intensive and challenging to deploy on smaller devices. To mitigate these drawbacks, we introduce a reasoning distillation methodology that transfers the reasoning capabilities of LLMs to more compact SLMs, enhancing their efficiency while maintaining robust performance. Our stance detection model, LOGIC (LLM-Originated Guidance for Internal Cognitive improvement of small language models in stance detection), is built on Bidirectional and Auto-Regressive Transformer (BART) and fine-tuned with auxiliary learning tasks, including reasoning distillation. By incorporating LLM-generated target knowledge into the inference process, LOGIC achieves state-of-the-art performance on the VAried Stance Topics (VAST) dataset, outperforming advanced models like GPT-3.5 Turbo and GPT-4 Turbo in stance detection tasks.
立场检测是自然语言处理中的一项关键任务,它能确定作者对特定目标的观点,在社会科学研究和各种应用中发挥着关键作用。传统方法将源自维基百科的数据纳入小型语言模型(SLM)以弥补目标知识的局限性,但由于维基百科贡献者群体的多样性,往往会在文章质量和长度方面存在不一致性。为了解决这些局限性,我们利用在大规模数据集上预训练的大型语言模型(LLM)来生成准确且与上下文相关的目标知识。通过提供针对立场检测任务量身定制的简洁、真实世界的见解,这种方法超越了基于维基百科信息的局限性。尽管大型语言模型具有卓越的推理能力,但它们计算量很大,在较小设备上部署具有挑战性。为了减轻这些缺点,我们引入了一种推理提炼方法,将大型语言模型的推理能力转移到更紧凑的小型语言模型上,提高其效率同时保持强大性能。我们的立场检测模型LOGIC(立场检测中用于小型语言模型内部认知改进的源自大型语言模型的指导)基于双向自回归Transformer(BART)构建,并通过包括推理提炼在内的辅助学习任务进行微调。通过将大型语言模型生成的目标知识纳入推理过程,LOGIC在多样立场主题(VAST)数据集上取得了领先的性能,在立场检测任务中优于GPT-3.5 Turbo和GPT-4 Turbo等先进模型。