Tokarchuk Laurissa, Wang Xinyue, Poslad Stefan
Cognitive Science Research Group, School of Electronic Engineering and Computer Science, Queen Mary, University of London, London, United Kingdom.
Centre for Intelligent Sensing, School of Electronic Engineering and Computer Science, Queen Mary, University of London, London, United Kingdom.
PLoS One. 2017 Nov 6;12(11):e0187401. doi: 10.1371/journal.pone.0187401. eCollection 2017.
In an age when people are predisposed to report real-world events through their social media accounts, many researchers value the benefits of mining user generated content from social media. Compared with the traditional news media, social media services, such as Twitter, can provide more complete and timely information about the real-world events. However events are often like a puzzle and in order to solve the puzzle/understand the event, we must identify all the sub-events or pieces. Existing Twitter event monitoring systems for sub-event detection and summarization currently typically analyse events based on partial data as conventional data collection methodologies are unable to collect comprehensive event data. This results in existing systems often being unable to report sub-events in real-time and often in completely missing sub-events or pieces in the broader event puzzle. This paper proposes a Sub-event detection by real-TIme Microblog monitoring (STRIM) framework that leverages the temporal feature of an expanded set of news-worthy event content. In order to more comprehensively and accurately identify sub-events this framework first proposes the use of adaptive microblog crawling. Our adaptive microblog crawler is capable of increasing the coverage of events while minimizing the amount of non-relevant content. We then propose a stream division methodology that can be accomplished in real time so that the temporal features of the expanded event streams can be analysed by a burst detection algorithm. In the final steps of the framework, the content features are extracted from each divided stream and recombined to provide a final summarization of the sub-events. The proposed framework is evaluated against traditional event detection using event recall and event precision metrics. Results show that improving the quality and coverage of event contents contribute to better event detection by identifying additional valid sub-events. The novel combination of our proposed adaptive crawler and our stream division/recombination technique provides significant gains in event recall (44.44%) and event precision (9.57%). The addition of these sub-events or pieces, allows us to get closer to solving the event puzzle.
在一个人们倾向于通过社交媒体账户报道现实世界事件的时代,许多研究人员重视从社交媒体挖掘用户生成内容的益处。与传统新闻媒体相比,诸如推特这样的社交媒体服务能够提供关于现实世界事件更完整、更及时的信息。然而,事件往往就像一个谜题,为了解开这个谜题/理解该事件,我们必须识别出所有子事件或碎片。现有的用于子事件检测和总结的推特事件监测系统目前通常基于部分数据来分析事件,因为传统的数据收集方法无法收集全面的事件数据。这导致现有系统常常无法实时报道子事件,并且在更广泛的事件谜题中常常完全遗漏子事件或碎片。本文提出了一种通过实时微博监测进行子事件检测(STRIM)的框架,该框架利用了一组经过扩展的有新闻价值的事件内容的时间特征。为了更全面、准确地识别子事件,该框架首先提出使用自适应微博抓取。我们的自适应微博爬虫能够在将不相关内容数量降至最低的同时,提高事件的覆盖范围。然后,我们提出一种能够实时完成的流划分方法,以便通过突发检测算法分析扩展后的事件流的时间特征。在该框架的最后步骤中,从每个划分后的流中提取内容特征并重新组合,以提供子事件的最终总结。使用事件召回率和事件精确率指标,将所提出的框架与传统事件检测方法进行评估比较。结果表明,通过识别额外的有效子事件,提高事件内容的质量和覆盖范围有助于更好地进行事件检测。我们所提出的自适应爬虫与流划分/重组技术的新颖结合,在事件召回率(44.44%)和事件精确率(9.57%)方面取得了显著提升。这些子事件或碎片的补充,使我们更接近解开事件谜题。