Suppr超能文献

CovTiNet:使用基于注意力的位置嵌入特征融合的新冠文本识别网络。

CovTiNet: Covid text identification network using attention-based positional embedding feature fusion.

作者信息

Hossain Md Rajib, Hoque Mohammed Moshiul, Siddique Nazmul, Sarker Iqbal H

机构信息

Chittagong, 4349 Bangladesh Department of Computer Science and Engineering, Chittagong University of Engineering and Technology.

Londonderry, UK School of Computing, Engineering and Intelligent Systems, Ulster University.

出版信息

Neural Comput Appl. 2023;35(18):13503-13527. doi: 10.1007/s00521-023-08442-y. Epub 2023 Mar 14.

Abstract

Covid text identification (CTI) is a crucial research concern in natural language processing (NLP). Social and electronic media are simultaneously adding a large volume of Covid-affiliated text on the World Wide Web due to the effortless access to the Internet, electronic gadgets and the Covid outbreak. Most of these texts are uninformative and contain misinformation, disinformation and malinformation that create an infodemic. Thus, Covid text identification is essential for controlling societal distrust and panic. Though very little Covid-related research (such as Covid disinformation, misinformation and fake news) has been reported in high-resource languages (e.g. English), CTI in low-resource languages (like Bengali) is in the preliminary stage to date. However, automatic CTI in Bengali text is challenging due to the deficit of benchmark corpora, complex linguistic constructs, immense verb inflexions and scarcity of NLP tools. On the other hand, the manual processing of Bengali Covid texts is arduous and costly due to their messy or unstructured forms. This research proposes a deep learning-based network (CovTiNet) to identify Covid text in Bengali. The CovTiNet incorporates an attention-based position embedding feature fusion for text-to-feature representation and attention-based CNN for Covid text identification. Experimental results show that the proposed CovTiNet achieved the highest accuracy of 96.61±.001% on the developed dataset () compared to the other methods and baselines (i.e. BERT-M, IndicBERT, ELECTRA-Bengali, DistilBERT-M, BiLSTM, DCNN, CNN, LSTM, VDCNN and ACNN).

摘要

新冠文本识别(CTI)是自然语言处理(NLP)中一个至关重要的研究关注点。由于互联网、电子设备的便捷获取以及新冠疫情的爆发,社交和电子媒体同时在万维网上添加了大量与新冠相关的文本。这些文本大多没有信息价值,且包含错误信息、虚假信息和有害信息,从而造成了信息疫情。因此,新冠文本识别对于控制社会不信任和恐慌至关重要。尽管在高资源语言(如英语)中已有少量与新冠相关的研究(如新冠虚假信息、错误信息和假新闻)被报道,但低资源语言(如孟加拉语)中的CTI至今仍处于初步阶段。然而,由于基准语料库的缺乏、复杂的语言结构、大量的动词词形变化以及NLP工具的稀缺,孟加拉语文本的自动CTI具有挑战性。另一方面,由于孟加拉语新冠文本形式杂乱或无结构,对其进行人工处理既艰巨又昂贵。本研究提出了一种基于深度学习的网络(CovTiNet)来识别孟加拉语中的新冠文本。CovTiNet结合了基于注意力的位置嵌入特征融合用于文本到特征表示,以及基于注意力的卷积神经网络用于新冠文本识别。实验结果表明,与其他方法和基线(即BERT-M、IndicBERT、ELECTRA-孟加拉语、DistilBERT-M、双向长短期记忆网络、深度卷积神经网络、卷积神经网络、长短期记忆网络、超深卷积神经网络和注意力卷积神经网络)相比,所提出的CovTiNet在开发的数据集上达到了96.61±.001%的最高准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04aa/10011801/557dfc06262f/521_2023_8442_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验