Suppr超能文献

用于位置预测的词嵌入与深度学习:通过英美推文追踪新冠病毒

Word embeddings and deep learning for location prediction: tracking Coronavirus from British and American tweets.

作者信息

Hasni Sarra, Faiz Sami

机构信息

Department of Information and Communication Technologies, Tunisia Polytechnic School, La Marsa, Tunisia.

Laboratory of Remote Sensing and Spatial Information Systems, ENIT, Tunis, Tunisia.

出版信息

Soc Netw Anal Min. 2021;11(1):66. doi: 10.1007/s13278-021-00777-5. Epub 2021 Jul 27.

Abstract

With the propagation of the Coronavirus pandemic, current trends on determining its individual and societal impacts become increasingly important. Recent researches grant special attention to the Coronavirus social networks infodemic to study such impacts. For this aim, we think that applying a geolocation process is crucial before proceeding to the infodemic management. In fact, the spread of reported events and actualities on social networks makes the identification of infected areas or locations of the information owners more challenging especially at a state level. In this paper, we focus on linguistic features to encode regional variations from short and noisy texts such as tweets to track this disease. We pay particular attention to contextual information for a better encoding of these features. We refer to some neural network-based models to capture relationships between words according to their contexts. Being examples of these models, we evaluate some word embedding ones to determine the most effective features' combination that has more spatial evidence. Then, we ensure a sequential modeling of words for a better understanding of contextual information using recurrent neural networks. Without defining restricted sets of local words in relation to the Coronavirus disease, our framework called DeepGeoloc demonstrates its ability to geolocate both tweets and twitterers. It also makes it possible to capture geosemantics of nonlocal words and to delimit the sparse use of local ones particularly in retweets and reported events. Compared to some baselines, DeepGeoloc achieved competitive results. It also proves its scalability to handle large amounts of data and to geolocate new tweets even those describing new topics in relation to this disease.

摘要

随着新冠疫情的蔓延,确定其对个人和社会影响的当前趋势变得越来越重要。最近的研究特别关注新冠疫情社交网络信息疫情,以研究此类影响。为此,我们认为在进行信息疫情管理之前应用地理定位过程至关重要。事实上,社交网络上报告的事件和实际情况的传播使得识别感染地区或信息所有者的位置变得更具挑战性,尤其是在国家层面。在本文中,我们专注于语言特征,以便对来自推文等简短且嘈杂文本的区域差异进行编码,以追踪这种疾病。我们特别关注上下文信息,以便更好地对这些特征进行编码。我们参考一些基于神经网络的模型,根据单词的上下文来捕捉它们之间的关系。作为这些模型的示例,我们评估一些词嵌入模型,以确定具有更多空间证据的最有效特征组合。然后,我们使用循环神经网络确保对单词进行顺序建模,以便更好地理解上下文信息。在不定义与新冠疾病相关的本地词汇受限集的情况下,我们名为DeepGeoloc的框架展示了其对推文和推特用户进行地理定位的能力。它还能够捕捉非本地词汇的地理语义,并界定本地词汇的稀疏使用,特别是在转发和报告事件中。与一些基线相比,DeepGeoloc取得了有竞争力的结果。它还证明了其处理大量数据的可扩展性,以及对新推文进行地理定位的能力,即使是那些描述与这种疾病相关新主题的推文。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b8c/8315503/06553c7fd13e/13278_2021_777_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验