Jadhav Ashutosh, Sheth Amit, Pathak Jyotishman
Knoesis Center, Wright State University, Dayton, OH.
Mayo Clinic, Rochester, MN.
AMIA Annu Symp Proc. 2014 Nov 14;2014:739-48. eCollection 2014.
Since the early 2000's, Internet usage for health information searching has increased significantly. Studying search queries can help us to understand users "information need" and how do they formulate search queries ("expression of information need"). Although cardiovascular diseases (CVD) affect a large percentage of the population, few studies have investigated how and what users search for CVD. We address this knowledge gap in the community by analyzing a large corpus of 10 million CVD related search queries from MayoClinic.com. Using UMLS MetaMap and UMLS semantic types/concepts, we developed a rule-based approach to categorize the queries into 14 health categories. We analyzed structural properties, types (keyword-based/Wh-questions/Yes-No questions) and linguistic structure of the queries. Our results show that the most searched health categories are 'Diseases/Conditions', 'Vital-Sings', 'Symptoms' and 'Living-with'. CVD queries are longer and are predominantly keyword-based. This study extends our knowledge about online health information searching and provides useful insights for Web search engines and health websites.
自21世纪初以来,用于健康信息搜索的互联网使用量显著增加。研究搜索查询可以帮助我们了解用户的“信息需求”以及他们如何制定搜索查询(“信息需求的表达”)。尽管心血管疾病(CVD)影响着很大比例的人口,但很少有研究调查用户如何以及搜索哪些与心血管疾病相关的内容。我们通过分析来自MayoClinic.com的1000万条与心血管疾病相关的搜索查询的大型语料库,填补了该领域的这一知识空白。利用UMLS MetaMap和UMLS语义类型/概念,我们开发了一种基于规则的方法,将查询分类为14个健康类别。我们分析了查询的结构属性、类型(基于关键词的查询/特殊疑问句/是非疑问句)和语言结构。我们的结果表明,搜索最多的健康类别是“疾病/病症”、“生命体征”、“症状”和“与疾病共存”。与心血管疾病相关的查询更长,并且主要是基于关键词的。这项研究扩展了我们对在线健康信息搜索的认识,并为网络搜索引擎和健康网站提供了有用的见解。