Computer Engineering Department, Yazd University, Yazd, Iran.
Int J Med Inform. 2018 Jul;115:53-63. doi: 10.1016/j.ijmedinf.2018.03.017. Epub 2018 Apr 12.
Digital epidemiology tries to identify diseases dynamics and spread behaviors using digital traces collected via search engines logs and social media posts. However, the impacts of news on information-seeking behaviors have been remained unknown.
Data employed in this research provided from two sources, (1) Parsijoo search engine query logs of 48 months, and (2) a set of documents of 28 months of Parsijoo's news service. Two classes of topics, i.e. macro-topics and micro-topics were selected to be tracked in query logs and news. Keywords of the macro-topics were automatically generated using web provided resources and exceeded 10k. Keyword set of micro-topics were limited to a numerable list including terms related to diseases and health-related activities. The tests are established in the form of three studies. Study A includes temporal analyses of 7 macro-topics in query logs. Study B considers analyzing seasonality of searching patterns of 9 micro-topics, and Study C assesses the impact of news media coverage on users' health-related information-seeking behaviors.
Study A showed that the hourly distribution of various macro-topics followed the changes in social activity level. Conversely, the interestingness of macro-topics did not follow the regulation of topic distributions. Among macro-topics, "Pharmacotherapy" has highest interestingness level and wider time-window of popularity. In Study B, seasonality of a limited number of diseases and health-related activities were analyzed. Trends of infectious diseases, such as flu, mumps and chicken pox were seasonal. Due to seasonality of most of diseases covered in national vaccination plans, the trend belonging to "Immunization and Vaccination" was seasonal, as well. Cancer awareness events caused peaks in search trends of "Cancer" and "Screening" micro-topics in specific days of each year that mimic repeated patterns which may mistakenly be identified as seasonality. In study C, we assessed the co-integration and correlation between news and query trends. Our results demonstrated that micro-topics sparsely covered in news media had lowest level of impressiveness and, subsequently, the lowest impact on users' intents.
Our results can reveal public reaction to social events, diseases and prevention procedures. Furthermore, we found that news trends are co-integrated with search queries and are able to reveal health-related events; however, they cannot be used interchangeably. It is recommended that the user-generated contents and news documents are analyzed mutually and interactively.
数字流行病学试图利用通过搜索引擎日志和社交媒体帖子收集的数字痕迹来识别疾病动态和传播行为。然而,新闻对信息搜索行为的影响仍不清楚。
本研究使用了两个来源的数据,(1)Parsijoo 搜索引擎查询日志 48 个月,(2)Parsijoo 新闻服务 28 个月的一套文档。在查询日志和新闻中选择了两类主题,即宏观主题和微观主题进行跟踪。宏观主题的关键词使用网络提供的资源自动生成,超过 10k。微观主题的关键词集限于包括与疾病和健康相关活动相关的术语的可数列表。测试以三项研究的形式建立。研究 A 包括查询日志中 7 个宏观主题的时间分析。研究 B 考虑分析 9 个微观主题的搜索模式季节性,研究 C 评估新闻媒体报道对用户健康相关信息搜索行为的影响。
研究 A 表明,各种宏观主题的每小时分布遵循社会活动水平的变化。相反,宏观主题的趣味性并不遵循主题分布的规律。在宏观主题中,“药物治疗”具有最高的趣味性水平和更广泛的流行时间窗口。在研究 B 中,分析了少数疾病和健康相关活动的季节性。流感、腮腺炎和水痘等传染病的趋势具有季节性。由于国家免疫计划涵盖的大多数疾病具有季节性,因此“免疫接种和疫苗接种”趋势也具有季节性。每年特定日期的癌症意识活动导致“癌症”和“筛查”微观主题的搜索趋势出现高峰,模仿了可能被错误识别为季节性的重复模式。在研究 C 中,我们评估了新闻和查询趋势之间的协整和相关性。我们的结果表明,新闻媒体报道较少的微观主题的影响力最低,因此对用户意图的影响最小。
我们的结果可以揭示公众对社会事件、疾病和预防程序的反应。此外,我们发现新闻趋势与搜索查询协整,能够揭示与健康相关的事件;但是,它们不能互换使用。建议相互分析和交互分析用户生成的内容和新闻文档。