• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种使用Bert和师生模型的阿拉伯语情感类别检测半监督方法。

A semi supervised approach to Arabic aspect category detection using Bert and teacher-student model.

作者信息

Almasri Miada, Al-Malki Norah, Alotaibi Reem

机构信息

Information Technology Department/Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia.

European Languages Department/Faculty of Arts and Humanities, King Abdulaziz University, Jeddah, Saudi Arabia.

出版信息

PeerJ Comput Sci. 2023 Jun 8;9:e1425. doi: 10.7717/peerj-cs.1425. eCollection 2023.

DOI:10.7717/peerj-cs.1425
PMID:37346563
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10280399/
Abstract

Aspect-based sentiment analysis tasks are well researched in English. However, we find such research lacking in the context of the Arabic language, especially with reference to aspect category detection. Most of this research is focusing on supervised machine learning methods that require the use of large, labeled datasets. Therefore, the aim of this research is to implement a semi-supervised self-training approach which utilizes a noisy student framework to enhance the capability of a deep learning model, AraBERT v02. The objective is to perform aspect category detection on both the SemEval 2016 hotel review dataset and the Hotel Arabic-Reviews Dataset (HARD) 2016. The four-step framework firstly entails developing a teacher model that is trained on the aspect categories of the SemEval 2016 labeled dataset. Secondly, it generates pseudo labels for the unlabeled HARD dataset based on the teacher model. Thirdly, it creates a noisy student model that is trained on the combined datasets (∼1 million sentences). The aim is to minimize the combined cross entropy loss. Fourthly, an ensembling of both teacher and student models is carried out to enhance the performance of AraBERT. Findings indicate that the ensembled teacher-student model demonstrates a 0.3% improvement in its micro F1 over the initial noisy student implementation, both in predicting the Aspect Categories in the combined datasets. However, it has achieved a 1% increase over the micro F1 of the teacher model. These results outperform both baselines and other deep learning models discussed in the related literature.

摘要

基于方面的情感分析任务在英文领域已有充分研究。然而,我们发现此类研究在阿拉伯语语境中较为匮乏,特别是在方面类别检测方面。此类研究大多聚焦于需要使用大型标注数据集的监督式机器学习方法。因此,本研究的目的是实施一种半监督自训练方法,该方法利用噪声学生框架来增强深度学习模型AraBERT v02的能力。目标是对SemEval 2016酒店评论数据集和2016年酒店阿拉伯语评论数据集(HARD)进行方面类别检测。这个四步框架首先需要开发一个在SemEval 2016标注数据集的方面类别上进行训练的教师模型。其次,基于教师模型为未标注的HARD数据集生成伪标签。第三,创建一个在组合数据集(约100万个句子)上进行训练的噪声学生模型。目的是最小化组合交叉熵损失。第四,对教师模型和学生模型进行集成以提高AraBERT的性能。研究结果表明,在预测组合数据集中的方面类别时,集成的教师-学生模型在微F1上比初始的噪声学生实现提高了0.3%。然而,它比教师模型的微F1提高了1%。这些结果优于相关文献中讨论的两个基线和其他深度学习模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b60d/10280399/1839619d7322/peerj-cs-09-1425-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b60d/10280399/c110a88735ca/peerj-cs-09-1425-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b60d/10280399/3802a81f879a/peerj-cs-09-1425-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b60d/10280399/f4a775349f5b/peerj-cs-09-1425-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b60d/10280399/0f8fb517ed54/peerj-cs-09-1425-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b60d/10280399/1839619d7322/peerj-cs-09-1425-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b60d/10280399/c110a88735ca/peerj-cs-09-1425-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b60d/10280399/3802a81f879a/peerj-cs-09-1425-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b60d/10280399/f4a775349f5b/peerj-cs-09-1425-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b60d/10280399/0f8fb517ed54/peerj-cs-09-1425-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b60d/10280399/1839619d7322/peerj-cs-09-1425-g005.jpg

相似文献

1
A semi supervised approach to Arabic aspect category detection using Bert and teacher-student model.一种使用Bert和师生模型的阿拉伯语情感类别检测半监督方法。
PeerJ Comput Sci. 2023 Jun 8;9:e1425. doi: 10.7717/peerj-cs.1425. eCollection 2023.
2
Efficient Combination of CNN and Transformer for Dual-Teacher Uncertainty-guided Semi-supervised Medical Image Segmentation.基于 CNN 和 Transformer 的高效组合用于双教师不确定性引导的半监督医学图像分割。
Comput Methods Programs Biomed. 2022 Nov;226:107099. doi: 10.1016/j.cmpb.2022.107099. Epub 2022 Sep 2.
3
Leveraging Symbolic Knowledge Bases for Commonsense Natural Language Inference Using Pattern Theory.利用符号知识库和模式理论进行常识自然语言推理。
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):13185-13202. doi: 10.1109/TPAMI.2023.3287837. Epub 2023 Oct 3.
4
Semi-supervised training of deep convolutional neural networks with heterogeneous data and few local annotations: An experiment on prostate histopathology image classification.基于异构数据和少量局部标注的深度卷积神经网络的半监督学习:前列腺组织病理学图像分类实验。
Med Image Anal. 2021 Oct;73:102165. doi: 10.1016/j.media.2021.102165. Epub 2021 Jul 14.
5
Improving Skin Lesion Segmentation with Self-Training.通过自我训练改进皮肤病变分割
Cancers (Basel). 2024 Mar 11;16(6):1120. doi: 10.3390/cancers16061120.
6
ArabBert-LSTM: improving Arabic sentiment analysis based on transformer model and Long Short-Term Memory.阿拉伯语BERT-LSTM:基于Transformer模型和长短期记忆改进阿拉伯语情感分析
Front Artif Intell. 2024 Jul 2;7:1408845. doi: 10.3389/frai.2024.1408845. eCollection 2024.
7
Complementary label learning based on knowledge distillation.基于知识蒸馏的互补标签学习。
Math Biosci Eng. 2023 Sep 19;20(10):17905-17918. doi: 10.3934/mbe.2023796.
8
TEST: Triplet Ensemble Student-Teacher Model for Unsupervised Person Re-Identification.测试:用于无监督行人重识别的三元组集成师生模型。
IEEE Trans Image Process. 2021;30:7952-7963. doi: 10.1109/TIP.2021.3112039. Epub 2021 Sep 22.
9
ArSa-Tweets: A novel Arabic sarcasm detection system based on deep learning model.ArSa-Tweets:一种基于深度学习模型的新型阿拉伯语讽刺检测系统。
Heliyon. 2024 Aug 28;10(17):e36892. doi: 10.1016/j.heliyon.2024.e36892. eCollection 2024 Sep 15.
10
A semi-supervised learning framework for micropapillary adenocarcinoma detection.一种用于微乳头状腺癌检测的半监督学习框架。
Int J Comput Assist Radiol Surg. 2022 Apr;17(4):639-648. doi: 10.1007/s11548-022-02565-8. Epub 2022 Feb 12.