Suppr超能文献

用于抗击疫情的分布式消息传递和轻量级流系统:以COVID-19地理标记推特数据集的空间分析为例

Distributed messaging and light streaming system for combating pandemics: A case study on spatial analysis of COVID-19 Geo-tagged Twitter dataset.

作者信息

Özgüven Yavuz Melih, Eken Süleyman

机构信息

Department of Computer Engineering, Kocaeli University, 41001 İzmit, Turkey.

Department of Information Systems Engineering, Kocaeli University, 41001 İzmit, Turkey.

出版信息

J Ambient Intell Humaniz Comput. 2023;14(2):773-787. doi: 10.1007/s12652-021-03328-0. Epub 2021 Jun 10.

Abstract

Real-time data processing and distributed messaging are problems that have been worked on for a long time. As the amount of spatial data being produced has increased, coupled with increasingly complex software solutions being developed, there is a need for platforms that address these needs. In this paper, we present a distributed and light streaming system for combating pandemics and give a case study on spatial analysis of the COVID-19 geo-tagged Twitter dataset. In this system, three of the major components are the translation of tweets matching with user-defined bounding boxes, name entity recognition in tweets, and skyline queries. Apache Pulsar addresses all these components in this paper. With the proposed system, end-users have the capability of getting COVID-19 related information within foreign regions, filtering/searching location, organization, person, and miscellaneous based tweets, and performing skyline based queries. The evaluation of the proposed system is done based on certain characteristics and performance metrics. The study differs greatly from other studies in terms of using distributed computing and big data technologies on spatial data to combat COVID-19. It is concluded that Pulsar is designed to handle large amounts of long-term on disk persistence.

摘要

实时数据处理和分布式消息传递是长期以来一直在研究的问题。随着生成的空间数据量不断增加,再加上正在开发的软件解决方案日益复杂,因此需要能够满足这些需求的平台。在本文中,我们提出了一种用于抗击疫情的分布式轻量级流系统,并给出了一个关于对带有地理标记的COVID-19推特数据集进行空间分析的案例研究。在这个系统中,三个主要组件是与用户定义的边界框匹配的推文翻译、推文中的命名实体识别以及天际线查询。本文中Apache Pulsar解决了所有这些组件的问题。通过所提出的系统,终端用户能够获取国外地区与COVID-19相关的信息,过滤/搜索基于地点、组织、人物和其他内容的推文,并执行基于天际线的查询。所提出系统的评估是基于某些特征和性能指标进行的。该研究在使用分布式计算和大数据技术处理空间数据以抗击COVID-19方面与其他研究有很大不同。得出的结论是,Pulsar旨在处理大量长期的磁盘持久性数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1d/8190525/1794dcd31642/12652_2021_3328_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验