Eysenbach G, Kohler Ch
Centre for Global eHealth Innovation, University Health Network, Toronto General Hospital, Canada.
AMIA Annu Symp Proc. 2003;2003:225-9.
While health information is often said to be the most sought after information on the web, empirical data on the actual frequency of health-related searches on the web are missing. In the present study we aimed to determine the prevalence of health-related searches on the web by analyzing search terms entered by people into popular search engines. We also made some preliminary attempts in qualitatively describing and classifying these searches. Occasional difficulties in determining what constitutes a "health-related" search led us to propose and validate a simple method to automatically classify a search string as "health-related". This method is based on determining the proportion of pages on the web containing the search string and the word "health", as a proportion of the total number of pages with the search string alone. Using human codings as gold standard we plotted a ROC curve and determined empirically that if this "co-occurance rate" is larger than 35%, the search string can be said to be health-related (sensitivity: 85.2%, specificity 80.4%). The results of our "human" codings of search queries determined that about 4.5% of all searches are "health-related". We estimate that globally a minimum of 6.75 Million health-related searches are being conducted on the web every day, which is roughly the same number of searches that have been conducted on the NLM Medlars system in 1996 in a full year.
虽然健康信息常被认为是网络上最受追捧的信息,但关于网络上与健康相关搜索的实际频率的实证数据却缺失。在本研究中,我们旨在通过分析人们在流行搜索引擎中输入的搜索词来确定网络上与健康相关搜索的流行程度。我们还在定性描述和分类这些搜索方面做了一些初步尝试。在确定什么构成“与健康相关”的搜索时偶尔遇到的困难促使我们提出并验证一种将搜索字符串自动分类为“与健康相关”的简单方法。该方法基于确定包含搜索字符串和“健康”一词的网页在仅包含搜索字符串的网页总数中所占的比例。以人工编码作为金标准,我们绘制了一条ROC曲线,并通过实证确定,如果这种“共现率”大于35%,则可以说搜索字符串与健康相关(敏感性:85.2%,特异性80.4%)。我们对搜索查询进行“人工”编码的结果表明,所有搜索中约4.5%是“与健康相关”的。我们估计,全球每天在网络上至少进行675万次与健康相关的搜索,这大致与1996年全年在NLM Medlars系统上进行的搜索次数相同。