Suppr超能文献

消费者在社交媒体上对统一医学语言系统(UMLS)概念的使用:博客和社交问答网站中与糖尿病相关的文本数据分析

Consumers' Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites.

作者信息

Park Min Sook, He Zhe, Chen Zhiwei, Oh Sanghee, Bian Jiang

机构信息

School of Information, Florida State University, Tallahassee, FL, United States.

Institute for Successful Longevity, Florida State University, Tallahassee, FL, United States.

出版信息

JMIR Med Inform. 2016 Nov 24;4(4):e41. doi: 10.2196/medinform.5748.

Abstract

BACKGROUND

The widely known terminology gap between health professionals and health consumers hinders effective information seeking for consumers.

OBJECTIVE

The aim of this study was to better understand consumers' usage of medical concepts by evaluating the coverage of concepts and semantic types of the Unified Medical Language System (UMLS) on diabetes-related postings in 2 types of social media: blogs and social question and answer (Q&A).

METHODS

We collected 2 types of social media data: (1) a total of 3711 blogs tagged with "diabetes" on Tumblr posted between February and October 2015; and (2) a total of 58,422 questions and associated answers posted between 2009 and 2014 in the diabetes category of Yahoo! Answers. We analyzed the datasets using a widely adopted biomedical text processing framework Apache cTAKES and its extension YTEX. First, we applied the named entity recognition (NER) method implemented in YTEX to identify UMLS concepts in the datasets. We then analyzed the coverage and the popularity of concepts in the UMLS source vocabularies across the 2 datasets (ie, blogs and social Q&A). Further, we conducted a concept-level comparative coverage analysis between SNOMED Clinical Terms (SNOMED CT) and Open-Access Collaborative Consumer Health Vocabulary (OAC CHV)-the top 2 UMLS source vocabularies that have the most coverage on our datasets. We also analyzed the UMLS semantic types that were frequently observed in our datasets.

RESULTS

We identified 2415 UMLS concepts from blog postings, 6452 UMLS concepts from social Q&A questions, and 10,378 UMLS concepts from the answers. The medical concepts identified in the blogs can be covered by 56 source vocabularies in the UMLS, while those in questions and answers can be covered by 58 source vocabularies. SNOMED CT was the dominant vocabulary in terms of coverage across all the datasets, ranging from 84.9% to 95.9%. It was followed by OAC CHV (between 73.5% and 80.0%) and Metathesaurus Names (MTH) (between 55.7% and 73.5%). All of the social media datasets shared frequent semantic types such as "Amino Acid, Peptide, or Protein," "Body Part, Organ, or Organ Component," and "Disease or Syndrome."

CONCLUSIONS

Although the 3 social media datasets vary greatly in size, they exhibited similar conceptual coverage among UMLS source vocabularies and the identified concepts showed similar semantic type distributions. As such, concepts that are both frequently used by consumers and also found in professional vocabularies such as SNOMED CT can be suggested to OAC CHV to improve its coverage.

摘要

背景

医疗专业人员与医疗消费者之间广为人知的术语差距阻碍了消费者有效地获取信息。

目的

本研究的目的是通过评估统一医学语言系统(UMLS)的概念和语义类型在两种社交媒体(博客和社交问答(Q&A))中与糖尿病相关帖子上的覆盖情况,更好地了解消费者对医学概念的使用情况。

方法

我们收集了两种社交媒体数据:(1)2015年2月至10月在Tumblr上标记为“糖尿病”的总共3711篇博客;(2)2009年至2014年在雅虎问答的糖尿病类别中发布的总共58422个问题及相关答案。我们使用广泛采用的生物医学文本处理框架Apache cTAKES及其扩展YTEX分析数据集。首先,我们应用YTEX中实现的命名实体识别(NER)方法来识别数据集中的UMLS概念。然后,我们分析了UMLS源词汇表中概念在两个数据集(即博客和社交问答)中的覆盖情况和流行程度。此外,我们对SNOMED临床术语(SNOMED CT)和开放获取协作消费者健康词汇表(OAC CHV)进行了概念级别的比较覆盖分析,这是在我们的数据集中覆盖范围最广的两个UMLS源词汇表。我们还分析了数据集中经常出现的UMLS语义类型。

结果

我们从博客帖子中识别出2415个UMLS概念,从社交问答问题中识别出6452个UMLS概念,从答案中识别出10378个UMLS概念。博客中识别出的医学概念可由UMLS中的56个源词汇表覆盖,而问题和答案中的概念可由58个源词汇表覆盖。就所有数据集的覆盖范围而言,SNOMED CT是主导词汇表,范围从84.9%到95.9%。其次是OAC CHV(73.5%至80.0%)和元词表名称(MTH)(55.7%至73.5%)。所有社交媒体数据集都共享常见的语义类型,如“氨基酸、肽或蛋白质”、“身体部位、器官或器官组成部分”以及“疾病或综合征”。

结论

尽管这三个社交媒体数据集在规模上差异很大,但它们在UMLS源词汇表中的概念覆盖情况相似,并且识别出的概念显示出相似的语义类型分布。因此,可以向OAC CHV建议那些消费者经常使用且在诸如SNOMED CT等专业词汇表中也存在的概念,以提高其覆盖范围。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11b8/5146325/e5dabc9a05c4/medinform_v4i4e41_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验