IEEE Trans Cybern. 2018 Nov;48(11):3218-3231. doi: 10.1109/TCYB.2017.2762344. Epub 2017 Oct 24.
Detecting events from massive social media data in social networks can facilitate browsing, search, and monitoring of real-time events by corporations, governments, and users. The short, conversational, heterogeneous, and real-time characteristics of social media data bring great challenges for event detection. The existing event detection approaches rely mainly on textual information, while the visual content of microblogs and the intrinsic correlation among the heterogeneous data are scarcely explored. To deal with the above challenges, we propose a novel real-time event detection method by generating an intermediate semantic level from social multimedia data, named microblog clique (MC), which is able to explore the high correlations among different microblogs. Specifically, the proposed method comprises three stages. First, the heterogeneous data in microblogs is formulated in a hypergraph structure. Hypergraph cut is conducted to group the highly correlated microblogs with the same topics as the MCs, which can address the information inadequateness and data sparseness issues. Second, a bipartite graph is constructed based on the generated MCs and the transfer cut partition is performed to detect the events. Finally, for new incoming microblogs, incremental hypergraph is constructed based on the latest MCs to generate new MCs, which are classified by bipartite graph partition into existing events or new ones. Extensive experiments are conducted on the events in the Brand-Social-Net dataset and the results demonstrate the superiority of the proposed method, as compared to the state-of-the-art approaches.
从社交网络中的大量社交媒体数据中检测事件可以方便企业、政府和用户浏览、搜索和监控实时事件。社交媒体数据的短、对话、异构和实时等特点给事件检测带来了巨大的挑战。现有的事件检测方法主要依赖于文本信息,而微博的视觉内容和异构数据之间的内在相关性却很少被探索。为了应对上述挑战,我们提出了一种新颖的实时事件检测方法,通过从社交媒体多媒体数据中生成中间语义层,即微博群(MC),来挖掘不同微博之间的高相关性。具体来说,该方法包括三个阶段。首先,将微博中的异构数据表示为超图结构。通过超图切割将具有相同主题的高度相关的微博聚集成 MC,从而解决信息不足和数据稀疏的问题。其次,基于生成的 MC 构建二分图,并进行转移切割分区以检测事件。最后,对于新的传入微博,基于最新的 MC 构建增量超图,然后通过二分图分区将新的 MC 分类为现有事件或新事件。在 Brand-Social-Net 数据集上的事件进行了广泛的实验,结果表明,与现有方法相比,所提出的方法具有优越性。