• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用有限的标注数据识别关于新冠疫情的英文信息推文。

Identifying COVID-19 english informative tweets using limited labelled data.

作者信息

Kothuru Srinivasulu, Santhanavijayan A

机构信息

Department of Computer Science and Engineering, National Institute of Technology, Thuvakudi, Tiruchirappalli, Tamil Nadu 620015 India.

出版信息

Soc Netw Anal Min. 2023;13(1):25. doi: 10.1007/s13278-023-01025-8. Epub 2023 Jan 17.

DOI:10.1007/s13278-023-01025-8
PMID:36686376
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9844936/
Abstract

Identifying COVID-19 informative tweets is very useful in building monitoring systems to track the latest updates. Existing approaches to identify informative tweets rely on a large number of labelled tweets to achieve good performances. As labelling is an expensive and laborious process, there is a need to develop approaches that can identify COVID-19 informative tweets using limited labelled data. In this paper, we propose a simple yet novel labelled data-efficient approach that achieves the state-of-the-art (SOTA) F1-score of 91.23 on the WNUT COVID-19 dataset using just 1000 tweets (14.3% of the full training set). Our labelled data-efficient approach starts with limited labelled data, augment it using data augmentation methods and then fine-tune the model using augmented data set. It is the first work to approach the task of identifying COVID-19 English informative tweets using limited labelled data yet achieve the new SOTA performance.

摘要

识别与新冠疫情相关的信息推文对于构建追踪最新动态的监测系统非常有用。现有的识别信息推文的方法依赖大量带标签的推文才能取得良好效果。由于标注是一个昂贵且费力的过程,因此需要开发能够使用有限的带标签数据来识别与新冠疫情相关的信息推文的方法。在本文中,我们提出了一种简单而新颖的高效利用带标签数据的方法,该方法在WNUT新冠疫情数据集上仅使用1000条推文(占完整训练集的14.3%)就达到了91.23的最优F1分数。我们的高效利用带标签数据的方法从有限的带标签数据开始,使用数据增强方法对其进行扩充,然后使用扩充后的数据集对模型进行微调。这是第一项使用有限的带标签数据来处理识别与新冠疫情相关的英文信息推文任务并取得新的最优性能的工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/ae78b0cd7b73/13278_2023_1025_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/7ae2612867c0/13278_2023_1025_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/ef4ea726797c/13278_2023_1025_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/f88e5ab5295c/13278_2023_1025_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/9ab2463f6f9c/13278_2023_1025_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/2f64acb66bed/13278_2023_1025_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/428eb232aad3/13278_2023_1025_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/9c8858bb624f/13278_2023_1025_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/7ef17b079af7/13278_2023_1025_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/cb674b87d787/13278_2023_1025_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/00be847013da/13278_2023_1025_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/8cc549747a3f/13278_2023_1025_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/ae78b0cd7b73/13278_2023_1025_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/7ae2612867c0/13278_2023_1025_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/ef4ea726797c/13278_2023_1025_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/f88e5ab5295c/13278_2023_1025_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/9ab2463f6f9c/13278_2023_1025_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/2f64acb66bed/13278_2023_1025_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/428eb232aad3/13278_2023_1025_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/9c8858bb624f/13278_2023_1025_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/7ef17b079af7/13278_2023_1025_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/cb674b87d787/13278_2023_1025_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/00be847013da/13278_2023_1025_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/8cc549747a3f/13278_2023_1025_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d280/9844936/ae78b0cd7b73/13278_2023_1025_Fig12_HTML.jpg

相似文献

1
Identifying COVID-19 english informative tweets using limited labelled data.使用有限的标注数据识别关于新冠疫情的英文信息推文。
Soc Netw Anal Min. 2023;13(1):25. doi: 10.1007/s13278-023-01025-8. Epub 2023 Jan 17.
2
Novel fuzzy deep learning approach for automated detection of useful COVID-19 tweets.用于自动检测有用 COVID-19 推文的新型模糊深度学习方法。
Artif Intell Med. 2023 Sep;143:102627. doi: 10.1016/j.artmed.2023.102627. Epub 2023 Jul 24.
3
COVID-19 outbreak: An ensemble pre-trained deep learning model for detecting informative tweets.新冠疫情:一种用于检测信息丰富推文的集成预训练深度学习模型。
Appl Soft Comput. 2021 Aug;107:107495. doi: 10.1016/j.asoc.2021.107495. Epub 2021 May 21.
4
Identifying informative tweets during a pandemic via a topic-aware neural language model.通过主题感知神经语言模型在大流行期间识别信息丰富的推文。
World Wide Web. 2023;26(1):55-70. doi: 10.1007/s11280-022-01034-1. Epub 2022 Mar 16.
5
Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study.对新冠疫情和流感流行进行社交媒体监测,并针对阿拉伯语推特数据中的非正式语言进行调整:定性研究。
JMIR Med Inform. 2021 Sep 17;9(9):e27670. doi: 10.2196/27670.
6
Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter: Machine Learning Approach.在 Twitter 上检测潜在有害和保护自杀相关内容:机器学习方法。
J Med Internet Res. 2022 Aug 17;24(8):e34705. doi: 10.2196/34705.
7
A deep multi-view imbalanced learning approach for identifying informative COVID-19 tweets from social media.一种用于从社交媒体中识别有价值的 COVID-19 推文的深度多视图不平衡学习方法。
Comput Biol Med. 2023 Sep;164:107232. doi: 10.1016/j.compbiomed.2023.107232. Epub 2023 Jul 8.
8
Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter.微调 BERT 模型以在 Twitter 上对大蒜和 COVID-19 相关的错误信息进行分类。
Int J Environ Res Public Health. 2022 Apr 22;19(9):5126. doi: 10.3390/ijerph19095126.
9
Categorization of tweets for damages: infrastructure and human damage assessment using fine-tuned BERT model.用于损害分类的推文:使用微调BERT模型进行基础设施和人员损害评估
PeerJ Comput Sci. 2024 Feb 16;10:e1859. doi: 10.7717/peerj-cs.1859. eCollection 2024.
10
Deep learning based sentiment analysis of public perception of working from home through tweets.基于深度学习的通过推文对公众在家工作看法的情感分析。
J Intell Inf Syst. 2023;60(1):255-274. doi: 10.1007/s10844-022-00736-2. Epub 2022 Aug 24.

引用本文的文献

1
Comparison of pretrained transformer-based models for influenza and COVID-19 detection using social media text data in Saskatchewan, Canada.加拿大萨斯喀彻温省使用社交媒体文本数据对基于预训练变压器的流感和新冠病毒检测模型的比较
Front Digit Health. 2023 Jun 28;5:1203874. doi: 10.3389/fdgth.2023.1203874. eCollection 2023.

本文引用的文献

1
COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter.COVID-Twitter-BERT:一种用于分析推特上新冠疫情相关内容的自然语言处理模型。
Front Artif Intell. 2023 Mar 14;6:1023281. doi: 10.3389/frai.2023.1023281. eCollection 2023.
2
COVID-19 outbreak: An ensemble pre-trained deep learning model for detecting informative tweets.新冠疫情:一种用于检测信息丰富推文的集成预训练深度学习模型。
Appl Soft Comput. 2021 Aug;107:107495. doi: 10.1016/j.asoc.2021.107495. Epub 2021 May 21.
3
AMMU: A survey of transformer-based biomedical pretrained language models.
基于变压器的生物医学预训练语言模型综述。
J Biomed Inform. 2022 Feb;126:103982. doi: 10.1016/j.jbi.2021.103982. Epub 2021 Dec 31.
4
BertMCN: Mapping colloquial phrases to standard medical concepts using BERT and highway network.BertMCN:使用 BERT 和高速公路网络将俗语映射到标准医学概念。
Artif Intell Med. 2021 Feb;112:102008. doi: 10.1016/j.artmed.2021.102008. Epub 2021 Jan 7.