Suppr超能文献

一种使用Bert和师生模型的阿拉伯语情感类别检测半监督方法。

A semi supervised approach to Arabic aspect category detection using Bert and teacher-student model.

作者信息

Almasri Miada, Al-Malki Norah, Alotaibi Reem

机构信息

Information Technology Department/Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia.

European Languages Department/Faculty of Arts and Humanities, King Abdulaziz University, Jeddah, Saudi Arabia.

出版信息

PeerJ Comput Sci. 2023 Jun 8;9:e1425. doi: 10.7717/peerj-cs.1425. eCollection 2023.

Abstract

Aspect-based sentiment analysis tasks are well researched in English. However, we find such research lacking in the context of the Arabic language, especially with reference to aspect category detection. Most of this research is focusing on supervised machine learning methods that require the use of large, labeled datasets. Therefore, the aim of this research is to implement a semi-supervised self-training approach which utilizes a noisy student framework to enhance the capability of a deep learning model, AraBERT v02. The objective is to perform aspect category detection on both the SemEval 2016 hotel review dataset and the Hotel Arabic-Reviews Dataset (HARD) 2016. The four-step framework firstly entails developing a teacher model that is trained on the aspect categories of the SemEval 2016 labeled dataset. Secondly, it generates pseudo labels for the unlabeled HARD dataset based on the teacher model. Thirdly, it creates a noisy student model that is trained on the combined datasets (∼1 million sentences). The aim is to minimize the combined cross entropy loss. Fourthly, an ensembling of both teacher and student models is carried out to enhance the performance of AraBERT. Findings indicate that the ensembled teacher-student model demonstrates a 0.3% improvement in its micro F1 over the initial noisy student implementation, both in predicting the Aspect Categories in the combined datasets. However, it has achieved a 1% increase over the micro F1 of the teacher model. These results outperform both baselines and other deep learning models discussed in the related literature.

摘要

基于方面的情感分析任务在英文领域已有充分研究。然而,我们发现此类研究在阿拉伯语语境中较为匮乏,特别是在方面类别检测方面。此类研究大多聚焦于需要使用大型标注数据集的监督式机器学习方法。因此,本研究的目的是实施一种半监督自训练方法,该方法利用噪声学生框架来增强深度学习模型AraBERT v02的能力。目标是对SemEval 2016酒店评论数据集和2016年酒店阿拉伯语评论数据集(HARD)进行方面类别检测。这个四步框架首先需要开发一个在SemEval 2016标注数据集的方面类别上进行训练的教师模型。其次,基于教师模型为未标注的HARD数据集生成伪标签。第三,创建一个在组合数据集(约100万个句子)上进行训练的噪声学生模型。目的是最小化组合交叉熵损失。第四,对教师模型和学生模型进行集成以提高AraBERT的性能。研究结果表明,在预测组合数据集中的方面类别时,集成的教师-学生模型在微F1上比初始的噪声学生实现提高了0.3%。然而,它比教师模型的微F1提高了1%。这些结果优于相关文献中讨论的两个基线和其他深度学习模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b60d/10280399/c110a88735ca/peerj-cs-09-1425-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验