Suppr超能文献

在互联网上搜索癌症信息:分析自然语言搜索查询

Searching for cancer information on the internet: analyzing natural language search queries.

作者信息

Bader Judith L, Theofanos Mary Frances

机构信息

National Cancer Institute, Office of Communications, Cancer Information Products and Services, Communications Technology Branch, Bethesda, MD 20852, USA.

出版信息

J Med Internet Res. 2003 Dec 11;5(4):e31. doi: 10.2196/jmir.5.4.e31.

Abstract

BACKGROUND

Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result.

OBJECTIVE

To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used.

METHODS

The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared >or= 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented.

RESULTS

Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized.

CONCLUSIONS

Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience.

摘要

背景

搜索健康信息是互联网用户最常进行的任务之一。许多用户从流行的搜索引擎开始搜索,而非著名的健康信息网站。我们知道,很多访问我们(美国国家癌症研究所)网站cancer.gov的用户是通过搜索引擎结果中的链接而来。

目的

为了更多地了解普通大众用户的具体需求,我们想知道普通用户真正想了解关于癌症的哪些方面、他们如何表述问题以及他们使用的细节程度。

方法

美国国家癌症研究所与AskJeeves公司合作,开发一种方法来收集、抽样和分析Ask.com网站上三个月内与癌症相关的查询。Ask.com是美国一个著名的消费者搜索引擎,每周接收超过3500万个查询。利用美国国家癌症研究所提供的500个术语和词根作为基准,AskJeeves在2001年8月确定了为期一周的癌症查询测试样本。在17208条查询中,在测试周里,这500个术语中只有37个出现次数≥5次/天。使用这37个术语,在2001年6月至8月的实际测试期间,在Ask.com的查询日志中发现了204165个癌症查询实例。其中,随机选择了7500个用户的问题进行详细分析,并归入适当的类别。给出了样本查询的确切表述。

结果

考虑到相同问题的多个实例,7500个用户查询样本代表了76077条查询(占三个月总查询量的37%)。总体而言,78.37%的抽样癌症查询是关于14种特定癌症类型的。在每种癌症类型中,查询被归入适当的子类别,至少包括以下方面:一般信息、症状、诊断与检测、治疗、统计数据、定义以及病因/风险/关联。查询中提及最多的特定癌症类型是消化/胃肠/肠道(15.0%)、乳腺(11.7%)、皮肤(11.3%)和泌尿生殖系统(10.5%)。关于特定癌症类型的查询的其他子类别因用户输入而异。非特定于某种癌症类型的查询也被跟踪和分类。

结论

自然语言搜索使用户有机会充分表达他们的信息需求,并且可以帮助不熟悉相关内容和词汇的用户。本研究分析的具体查询反映了研究期间报道的新闻和研究,并且肯定会随不同的研究日期而变化。分析来自搜索引擎的查询是了解应向特定网站用户提供何种内容的一种方式。用户使用完整句子和关键词提问,经常拼错单词。提供自然语言搜索选项并不能消除对良好信息架构、可用性工程和用户测试的需求,以便优化用户体验。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8218/1550578/f5cbc5705e3a/jmir_v5i4e31_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验