Suppr超能文献

通过挖掘一个社交问答网站丰富消费者健康词汇:一种基于相似度的方法。

Enriching consumer health vocabulary through mining a social Q&A site: A similarity-based approach.

作者信息

He Zhe, Chen Zhiwei, Oh Sanghee, Hou Jinghui, Bian Jiang

机构信息

School of Information, Florida State University, Tallahassee, FL 32306, USA; Institute for Successful Longevity, Florida State University, Tallahassee, FL 32306, USA.

Department of Computer Science, Florida State University, Tallahassee, FL 32306, USA.

出版信息

J Biomed Inform. 2017 May;69:75-85. doi: 10.1016/j.jbi.2017.03.016. Epub 2017 Mar 27.

Abstract

The widely known vocabulary gap between health consumers and healthcare professionals hinders information seeking and health dialogue of consumers on end-user health applications. The Open Access and Collaborative Consumer Health Vocabulary (OAC CHV), which contains health-related terms used by lay consumers, has been created to bridge such a gap. Specifically, the OAC CHV facilitates consumers' health information retrieval by enabling consumer-facing health applications to translate between professional language and consumer friendly language. To keep up with the constantly evolving medical knowledge and language use, new terms need to be identified and added to the OAC CHV. User-generated content on social media, including social question and answer (social Q&A) sites, afford us an enormous opportunity in mining consumer health terms. Existing methods of identifying new consumer terms from text typically use ad-hoc lexical syntactic patterns and human review. Our study extends an existing method by extracting n-grams from a social Q&A textual corpus and representing them with a rich set of contextual and syntactic features. Using K-means clustering, our method, simiTerm, was able to identify terms that are both contextually and syntactically similar to the existing OAC CHV terms. We tested our method on social Q&A corpora on two disease domains: diabetes and cancer. Our method outperformed three baseline ranking methods. A post-hoc qualitative evaluation by human experts further validated that our method can effectively identify meaningful new consumer terms on social Q&A.

摘要

健康消费者与医疗保健专业人员之间广为人知的词汇差距,阻碍了消费者在终端用户健康应用程序上寻求信息和进行健康对话。开放获取与协作式消费者健康词汇表(OAC CHV)应运而生,它包含普通消费者使用的与健康相关的术语,旨在弥合这一差距。具体而言,OAC CHV通过使面向消费者的健康应用程序能够在专业语言和消费者友好语言之间进行翻译,促进了消费者的健康信息检索。为了跟上不断发展的医学知识和语言使用情况,需要识别新术语并将其添加到OAC CHV中。社交媒体上的用户生成内容,包括社交问答(social Q&A)网站,为我们挖掘消费者健康术语提供了巨大机会。从文本中识别新消费者术语的现有方法通常使用临时的词汇句法模式和人工审核。我们的研究扩展了一种现有方法,即从社交问答文本语料库中提取n元语法并用丰富的上下文和句法特征来表示它们。使用K均值聚类,我们的方法simiTerm能够识别出在上下文和句法上与现有OAC CHV术语相似的术语。我们在两个疾病领域(糖尿病和癌症)的社交问答语料库上测试了我们的方法。我们的方法优于三种基线排序方法。人类专家进行的事后定性评估进一步验证了我们的方法能够有效地在社交问答中识别出有意义的新消费者术语。

相似文献

10
Exploring and developing consumer health vocabularies.探索和开发消费者健康词汇表。
J Am Med Inform Assoc. 2006 Jan-Feb;13(1):24-9. doi: 10.1197/jamia.M1761. Epub 2005 Oct 12.

引用本文的文献

本文引用的文献

1
Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection.迈向肥胖-癌症知识库:生物医学实体识别与关系检测
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2016 Dec;2016:1081-1088. doi: 10.1109/BIBM.2016.7822672. Epub 2017 Jan 19.
6
Assessing the readability of ClinicalTrials.gov.评估美国国立医学图书馆临床试验数据库的可读性。
J Am Med Inform Assoc. 2016 Mar;23(2):269-75. doi: 10.1093/jamia/ocv062. Epub 2015 Aug 11.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验