Roberts Kirk, Demner-Fushman Dina
Lister Hill National Center for Biomedical Communications, US National Library of Medicine, National Institutes of Health 8600 Rockville Pike, Building 38A/1003H Bethesda, MD, 20894, USA
Lister Hill National Center for Biomedical Communications, US National Library of Medicine, National Institutes of Health 8600 Rockville Pike, Building 38A/1003H Bethesda, MD, 20894, USA.
J Am Med Inform Assoc. 2016 Jul;23(4):802-11. doi: 10.1093/jamia/ocw024. Epub 2016 May 4.
To understand how consumer questions on online resources differ from questions asked by professionals, and how such consumer questions differ across resources.
Ten online question corpora, 5 consumer and 5 professional, with a combined total of over 40 000 questions, were analyzed using a variety of natural language processing techniques. These techniques analyze questions at the lexical, syntactic, and semantic levels, exposing differences in both form and content.
Consumer questions tend to be longer than professional questions, more closely resemble open-domain language, and focus far more on medical problems. Consumers ask more sub-questions, provide far more background information, and ask different types of questions than professionals. Furthermore, there is substantial variance of these factors between the different consumer corpora.
The form of consumer questions is highly dependent upon the individual online resource, especially in the amount of background information provided. Professionals, on the other hand, provide very little background information and often ask much shorter questions. The content of consumer questions is also highly dependent upon the resource. While professional questions commonly discuss treatments and tests, consumer questions focus disproportionately on symptoms and diseases. Further, consumers place far more emphasis on certain types of health problems (eg, sexual health).
Websites for consumers to submit health questions are a popular online resource filling important gaps in consumer health information. By analyzing how consumers write questions on these resources, we can better understand these gaps and create solutions for improving information access.This article is part of the Special Focus on Person-Generated Health and Wellness Data, which published in the May 2016 issue, Volume 23, Issue 3.
了解消费者在网络资源上提出的问题与专业人士提出的问题有何不同,以及这些消费者问题在不同资源之间的差异。
使用多种自然语言处理技术对十个在线问题语料库进行分析,其中五个是消费者语料库,五个是专业人士语料库,总共包含超过40000个问题。这些技术在词汇、句法和语义层面分析问题,揭示形式和内容上的差异。
消费者提出的问题往往比专业人士提出的问题更长,更类似于开放领域语言,并且更多地聚焦于医疗问题。与专业人士相比,消费者会提出更多子问题,提供更多背景信息,且提出不同类型的问题。此外,不同消费者语料库之间这些因素存在很大差异。
消费者问题的形式高度依赖于具体的在线资源,特别是所提供背景信息的数量。另一方面,专业人士提供的背景信息很少,并且通常提出的问题要短得多。消费者问题的内容也高度依赖于资源。专业问题通常讨论治疗和检查,而消费者问题则不成比例地聚焦于症状和疾病。此外,消费者更加重视某些类型的健康问题(例如性健康)。
供消费者提交健康问题的网站是一种受欢迎的在线资源,填补了消费者健康信息方面的重要空白。通过分析消费者在这些资源上如何撰写问题,我们可以更好地理解这些空白,并创建改善信息获取的解决方案。本文是发表于2016年5月第23卷第3期的“个人生成的健康与 wellness 数据特别关注”的一部分。