• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种使用大数据平台的综合社交媒体数据处理与分析架构:以推特洪水风险信息为例

A comprehensive social media data processing and analytics architecture by using big data platforms: a case study of twitter flood-risk messages.

作者信息

Podhoranyi Michal

机构信息

IT4Innovations - VSB Technical University, 17.listopadu 15, 70833 Ostrava, Czech Republic.

出版信息

Earth Sci Inform. 2021;14(2):913-929. doi: 10.1007/s12145-021-00601-w. Epub 2021 Mar 11.

DOI:10.1007/s12145-021-00601-w
PMID:33727982
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7951942/
Abstract

The main objective of the article is to propose an advanced architecture and workflow based on Apache Hadoop and Apache Spark big data platforms. The primary purpose of the presented architecture is collecting, storing, processing, and analysing intensive data from social media streams. This paper presents how the proposed architecture and data workflow can be applied to analyse Tweets with a specific flood topic. The secondary objective, trying to describe the flood alert situation by using only Tweet messages and exploring the informative potential of such data is demonstrated as well. The predictive machine learning approach based on Bayes Theorem was utilized to classify flood and no flood messages. For this study, approximately 100,000 Twitter messages were processed and analysed. Messages were related to the flooding domain and collected over a period of 5 days (14 May - 18 May 2018). Spark application was developed to run data processing commands automatically and to generate the appropriate output data. Results confirmed the advantages of many well-known features of Spark and Hadoop in social media data processing. It was noted that such technologies are prepared to deal with social media data streams, but there are still challenges that one has to take into account. Based on the flood tweet analysis, it was observed that Twitter messages with some considerations are informative enough to be used to estimate general flood alert situations in particular regions. Text analysis techniques proved that Twitter messages contain valuable flood-spatial information.

摘要

本文的主要目标是基于Apache Hadoop和Apache Spark大数据平台提出一种先进的架构和工作流程。所呈现架构的主要目的是收集、存储、处理和分析来自社交媒体流的密集数据。本文展示了所提出的架构和数据工作流程如何应用于分析特定洪水主题的推文。次要目标是,尝试仅使用推文消息描述洪水警报情况并探索此类数据的信息潜力,这一点也得到了证明。基于贝叶斯定理的预测性机器学习方法被用于对洪水和非洪水消息进行分类。在本研究中,大约处理和分析了100,000条Twitter消息。这些消息与洪水领域相关,是在5天(2018年5月14日至18日)的时间段内收集的。开发了Spark应用程序来自动运行数据处理命令并生成适当的输出数据。结果证实了Spark和Hadoop在社交媒体数据处理中许多知名特性的优势。值得注意的是,此类技术已准备好处理社交媒体数据流,但仍有一些挑战需要考虑。基于对洪水推文的分析,观察到经过一些考量的Twitter消息具有足够的信息量,可用于估计特定地区的一般洪水警报情况。文本分析技术证明Twitter消息包含有价值的洪水空间信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/17338b780983/12145_2021_601_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/ec844ec054a0/12145_2021_601_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/ec6848e587ae/12145_2021_601_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/c6eac239f3ac/12145_2021_601_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/dbe8df30f01d/12145_2021_601_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/3e8bfba0d9ea/12145_2021_601_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/9b9a2d7a5ede/12145_2021_601_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/b48e01b2d044/12145_2021_601_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/0f222ce3486b/12145_2021_601_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/baa353116430/12145_2021_601_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/0fb03deaf3c1/12145_2021_601_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/75555503c059/12145_2021_601_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/72984b205629/12145_2021_601_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/17338b780983/12145_2021_601_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/ec844ec054a0/12145_2021_601_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/ec6848e587ae/12145_2021_601_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/c6eac239f3ac/12145_2021_601_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/dbe8df30f01d/12145_2021_601_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/3e8bfba0d9ea/12145_2021_601_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/9b9a2d7a5ede/12145_2021_601_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/b48e01b2d044/12145_2021_601_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/0f222ce3486b/12145_2021_601_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/baa353116430/12145_2021_601_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/0fb03deaf3c1/12145_2021_601_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/75555503c059/12145_2021_601_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/72984b205629/12145_2021_601_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98eb/7951942/17338b780983/12145_2021_601_Fig13_HTML.jpg

相似文献

1
A comprehensive social media data processing and analytics architecture by using big data platforms: a case study of twitter flood-risk messages.一种使用大数据平台的综合社交媒体数据处理与分析架构:以推特洪水风险信息为例
Earth Sci Inform. 2021;14(2):913-929. doi: 10.1007/s12145-021-00601-w. Epub 2021 Mar 11.
2
Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture?剖析推特上关于抗生素的讨论:整体情况如何?
J Med Internet Res. 2015 Jun 19;17(6):e154. doi: 10.2196/jmir.4220.
3
Iktishaf+: A Big Data Tool with Automatic Labeling for Road Traffic Social Sensing and Event Detection Using Distributed Machine Learning.Iktishaf+:一种大数据工具,具有自动标记功能,用于使用分布式机器学习进行道路交通社会感知和事件检测。
Sensors (Basel). 2021 Apr 24;21(9):2993. doi: 10.3390/s21092993.
4
Analysis of twitter users' sharing of official new york storm response messages.推特用户对纽约官方风暴应对信息分享情况的分析。
Med 2 0. 2014 Mar 20;3(1):e1. doi: 10.2196/med20.3237. eCollection 2014 Jan-Jun.
5
Human Behavior Analysis Using Intelligent Big Data Analytics.利用智能大数据分析进行人类行为分析
Front Psychol. 2021 Jul 6;12:686610. doi: 10.3389/fpsyg.2021.686610. eCollection 2021.
6
Tweet content related to sexually transmitted diseases: no joking matter.与性传播疾病相关的推文内容:可不是闹着玩的。
J Med Internet Res. 2014 Oct 6;16(10):e228. doi: 10.2196/jmir.3259.
7
Identifying Sentiment of Hookah-Related Posts on Twitter.识别推特上与水烟相关帖子的情感倾向。
JMIR Public Health Surveill. 2017 Oct 18;3(4):e74. doi: 10.2196/publichealth.8133.
8
Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach.基于本体的推特消息中医疗命名实体识别的递归神经网络方法。
Int J Environ Res Public Health. 2019 Sep 27;16(19):3628. doi: 10.3390/ijerph16193628.
9
Design an efficient data driven decision support system to predict flooding by analysing heterogeneous and multiple data sources using Data Lake.设计一个高效的数据驱动决策支持系统,通过使用数据湖分析异构和多个数据源来预测洪水。
MethodsX. 2023 Jun 22;11:102262. doi: 10.1016/j.mex.2023.102262. eCollection 2023 Dec.
10
A framework to extract biomedical knowledge from gluten-related tweets: The case of dietary concerns in digital era.从与麸质相关的推文中提取生物医学知识的框架:数字时代饮食担忧案例。
Artif Intell Med. 2021 Aug;118:102131. doi: 10.1016/j.artmed.2021.102131. Epub 2021 Jun 25.