Kämpf Mirko, Tessenow Eric, Kenett Dror Y, Kantelhardt Jan W
Institut für Physik, Martin-Luther-Universität Halle-Wittenberg, Sachsen-Anhalt, Germany.
School of Media and Communication, University of Leeds, Leeds, United Kingdom.
PLoS One. 2015 Dec 31;10(12):e0141892. doi: 10.1371/journal.pone.0141892. eCollection 2015.
Can online media predict new and emerging trends, since there is a relationship between trends in society and their representation in online systems? While several recent studies have used Google Trends as the leading online information source to answer corresponding research questions, we focus on the online encyclopedia Wikipedia often used for deeper topical reading. Wikipedia grants open access to all traffic data and provides lots of additional (semantic) information in a context network besides single keywords. Specifically, we suggest and study context-normalized and time-dependent measures for a topic's importance based on page-view time series of Wikipedia articles in different languages and articles related to them by internal links. As an example, we present a study of the recently emerging Big Data market with a focus on the Hadoop ecosystem, and compare the capabilities of Wikipedia versus Google in predicting its popularity and life cycles. To support further applications, we have developed an open web platform to share results of Wikipedia analytics, providing context-rich and language-independent relevance measures for emerging trends.
鉴于社会趋势与其在网络系统中的呈现之间存在关联,在线媒体能否预测新出现的趋势呢?尽管最近有几项研究将谷歌趋势作为主要的在线信息源来回答相应的研究问题,但我们关注的是常用于深入主题阅读的在线百科全书维基百科。维基百科允许公开访问所有流量数据,并且除了单个关键词之外,还能在上下文网络中提供大量额外的(语义)信息。具体而言,我们基于不同语言的维基百科文章及其通过内部链接相关的文章的页面浏览时间序列,提出并研究了针对主题重要性的上下文归一化和时间相关度量。例如,我们对最近兴起的大数据市场进行了一项研究,重点关注Hadoop生态系统,并比较维基百科和谷歌在预测其受欢迎程度和生命周期方面的能力。为了支持进一步的应用,我们开发了一个开放的网络平台来分享维基百科分析的结果,为新出现的趋势提供丰富上下文且与语言无关的相关性度量。