Suppr超能文献

基于Facebook数据分析的多语言主题建模用于追踪COVID-19趋势

Multilingual topic modeling for tracking COVID-19 trends based on Facebook data analysis.

作者信息

Amara Amina, Hadj Taieb Mohamed Ali, Ben Aouicha Mohamed

机构信息

Multimedia, InfoRmation systems and Advanced Computing Laboratory, University of Sfax, Sfax, Tunisia.

Faculty of Sciences, University of Sfax, Sfax, Tunisia.

出版信息

Appl Intell (Dordr). 2021;51(5):3052-3073. doi: 10.1007/s10489-020-02033-3. Epub 2021 Feb 13.

Abstract

Social data has shown important role in tracking, monitoring and risk management of disasters. Indeed, several works focused on the benefits of social data analysis for the healthcare practices and curing domain. Similarly, these data are exploited now for tracking the COVID-19 pandemic but the majority of works exploited Twitter as source. In this paper, we choose to exploit Facebook, rarely used, for tracking the evolution of COVID-19 related trends. In fact, a multilingual dataset covering 7 languages (English (EN), Arabic (AR), Spanish (ES), Italian (IT), German (DE), French (FR) and Japanese (JP)) is extracted from Facebook public posts. The proposal is an analytics process including a data gathering step, pre-processing, LDA-based topic modeling and presentation module using graph structure. Data analysing covers the duration spanned from January 1st, 2020 to May 15, 2020 divided on three periods in cumulative way: first period January-February, second period March-April and the last one to 15 May. The results showed that the extracted topics correspond to the chronological development of what has been circulated around the pandemic and the measures that have been taken according to the various languages under discussion representing several countries.

摘要

社交数据在灾害的跟踪、监测和风险管理中发挥了重要作用。事实上,有几项工作聚焦于社交数据分析在医疗实践和治疗领域的益处。同样,现在这些数据被用于追踪新冠疫情,但大多数工作都将推特作为数据来源。在本文中,我们选择利用较少被使用的脸书来追踪与新冠疫情相关趋势的演变。实际上,一个涵盖7种语言(英语(EN)、阿拉伯语(AR)、西班牙语(ES)、意大利语(IT)、德语(DE)、法语(FR)和日语(JP))的多语言数据集是从脸书公开帖子中提取的。本文提出的是一个分析过程,包括数据收集步骤、预处理、基于潜在狄利克雷分配(LDA)的主题建模以及使用图结构的呈现模块。数据分析涵盖了从2020年1月1日到2020年5月15日的时间段,并以累积的方式分为三个时期:第一个时期为1月至2月,第二个时期为3月至4月,最后一个时期到5月15日。结果表明,提取的主题与围绕疫情传播的时间发展以及根据所讨论的代表几个国家的各种语言所采取的措施相对应。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e5e/7881346/0862244a6922/10489_2020_2033_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验