计算健康传播中的搜索词识别方法：YouTube上健康内容的词嵌入与网络方法

Search Term Identification Methods for Computational Health Communication: Word Embedding and Network Approach for Health Content on YouTube.

作者信息

Tong Chau, Margolin Drew, Chunara Rumi, Niederdeppe Jeff, Taylor Teairah, Dunbar Natalie, King Andy J

机构信息

Department of Communication, Cornell University, Ithaca, NY, United States.

Department of Biostatistics, School of Global Public Health, New York University, New York, NY, United States.

出版信息

JMIR Med Inform. 2022 Aug 30;10(8):e37862. doi: 10.2196/37862.

DOI:10.2196/37862

PMID:36040760

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9472050/

Abstract

BACKGROUND

Common methods for extracting content in health communication research typically involve using a set of well-established queries, often names of medical procedures or diseases, that are often technical or rarely used in the public discussion of health topics. Although these methods produce high recall (ie, retrieve highly relevant content), they tend to overlook health messages that feature colloquial language and layperson vocabularies on social media. Given how such messages could contain misinformation or obscure content that circumvents official medical concepts, correctly identifying (and analyzing) them is crucial to the study of user-generated health content on social media platforms.

OBJECTIVE

Health communication scholars would benefit from a retrieval process that goes beyond the use of standard terminologies as search queries. Motivated by this, this study aims to put forward a search term identification method to improve the retrieval of user-generated health content on social media. We focused on cancer screening tests as a subject and YouTube as a platform case study.

METHODS

We retrieved YouTube videos using cancer screening procedures (colonoscopy, fecal occult blood test, mammogram, and pap test) as seed queries. We then trained word embedding models using text features from these videos to identify the nearest neighbor terms that are semantically similar to cancer screening tests in colloquial language. Retrieving more YouTube videos from the top neighbor terms, we coded a sample of 150 random videos from each term for relevance. We then used text mining to examine the new content retrieved from these videos and network analysis to inspect the relations between the newly retrieved videos and videos from the seed queries.

RESULTS

The top terms with semantic similarities to cancer screening tests were identified via word embedding models. Text mining analysis showed that the 5 nearest neighbor terms retrieved content that was novel and contextually diverse, beyond the content retrieved from cancer screening concepts alone. Results from network analysis showed that the newly retrieved videos had at least one total degree of connection (sum of indegree and outdegree) with seed videos according to YouTube relatedness measures.

CONCLUSIONS

We demonstrated a retrieval technique to improve recall and minimize precision loss, which can be extended to various health topics on YouTube, a popular video-sharing social media platform. We discussed how health communication scholars can apply the technique to inspect the performance of the retrieval strategy before investing human coding resources and outlined suggestions on how such a technique can be extended to other health contexts.

摘要

背景

健康传播研究中提取内容的常用方法通常涉及使用一组既定的查询词，这些词往往是医疗程序或疾病的名称，通常较为专业，在公众对健康话题的讨论中很少使用。尽管这些方法具有高召回率（即检索出高度相关的内容），但它们往往会忽略社交媒体上以口语化语言和外行词汇为特征的健康信息。鉴于此类信息可能包含错误信息或规避官方医学概念的模糊内容，正确识别（并分析）它们对于研究社交媒体平台上用户生成的健康内容至关重要。

目的

健康传播学者将从超越使用标准术语作为搜索查询的检索过程中受益。受此启发，本研究旨在提出一种搜索词识别方法，以改进社交媒体上用户生成的健康内容的检索。我们将癌症筛查测试作为主题，以YouTube作为平台案例研究。

方法

我们使用癌症筛查程序（结肠镜检查、粪便潜血试验、乳房X线摄影和巴氏试验）作为种子查询来检索YouTube视频。然后，我们使用这些视频的文本特征训练词嵌入模型，以识别在口语化语言中与癌症筛查测试语义相似的最近邻词。从顶级邻词中检索更多YouTube视频，我们对每个词的150个随机视频样本进行相关性编码。然后，我们使用文本挖掘来检查从这些视频中检索到的新内容，并使用网络分析来检查新检索到的视频与种子查询视频之间的关系。

结果

通过词嵌入模型确定了与癌症筛查测试语义相似的顶级词汇。文本挖掘分析表明，5个最近邻词检索到的内容新颖且上下文多样，超出了仅从癌症筛查概念检索到的内容。网络分析结果表明，根据YouTube相关性度量，新检索到的视频与种子视频至少有一个总连接度（入度和出度之和）。

结论

我们展示了一种检索技术，可提高召回率并最大限度地减少精确率损失，该技术可扩展到流行视频分享社交媒体平台YouTube上的各种健康话题。我们讨论了健康传播学者如何在投入人力编码资源之前应用该技术来检查检索策略的性能，并概述了如何将该技术扩展到其他健康背景的建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5dfe/9472050/7b17503239fd/medinform_v10i8e37862_fig1.jpg

相似文献

Search Term Identification Methods for Computational Health Communication: Word Embedding and Network Approach for Health Content on YouTube.计算健康传播中的搜索词识别方法：YouTube上健康内容的词嵌入与网络方法

JMIR Med Inform. 2022 Aug 30;10(8):e37862. doi: 10.2196/37862.

The role of taxonomies in social media and the semantic web for health education. A study of SNOMED CT terms in YouTube health video tags.分类法在社交媒体和语义网中对健康教育的作用。关于YouTube健康视频标签中SNOMED CT术语的研究。

Methods Inf Med. 2013;52(2):168-79. doi: 10.3414/ME12-02-0005. Epub 2013 Feb 28.

Sexuality and Sexual and Reproductive Health Depiction in Social Media: Content Analysis of Kinyarwanda YouTube Channels.社交媒体中关于性取向以及性与生殖健康的描述：基尼亚卢旺达语YouTube频道的内容分析

J Med Internet Res. 2023 Sep 27;25:e46488. doi: 10.2196/46488.

Communication about suicide in YouTube videos: Content analysis of German-language videos retrieved with method-and help-related search terms.YouTube 视频中的自杀相关内容交流：使用方法和帮助相关搜索词检索到的德语视频的内容分析。

Psychiatry Res. 2020 Aug;290:113170. doi: 10.1016/j.psychres.2020.113170. Epub 2020 Jun 1.

Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases.利用生物医学和一般领域知识库评估神经词汇嵌入中的语义关系。

BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):65. doi: 10.1186/s12911-018-0630-x.

Seeking Mental Health Support Among College Students in Video-Based Social Media: Content and Statistical Analysis of YouTube Videos.在基于视频的社交媒体中寻求大学生心理健康支持：YouTube视频的内容与统计分析

JMIR Form Res. 2021 Nov 11;5(11):e31944. doi: 10.2196/31944.

Alzheimer's Disease in Social Media: Content Analysis of YouTube Videos.社交媒体中的阿尔茨海默病：YouTube视频内容分析

Interact J Med Res. 2017 Oct 19;6(2):e19. doi: 10.2196/ijmr.8612.

Direct-to-Consumer Genetic Testing on Social Media: Topic Modeling and Sentiment Analysis of YouTube Users' Comments.社交媒体上的直接面向消费者的基因检测：YouTube用户评论的主题建模与情感分析

JMIR Infodemiology. 2022 Sep 15;2(2):e38749. doi: 10.2196/38749. eCollection 2022 Jul-Dec.

YouTube Videos Related to the Fukushima Nuclear Disaster: Content Analysis.YouTube 上与福岛核灾难相关的视频：内容分析。

JMIR Public Health Surveill. 2021 Jun 7;7(6):e26481. doi: 10.2196/26481.

YouTube Videos as a Source of Misinformation on Idiopathic Pulmonary Fibrosis.YouTube 视频作为特发性肺纤维化错误信息的来源。

Ann Am Thorac Soc. 2019 May;16(5):572-579. doi: 10.1513/AnnalsATS.201809-644OC.

引用本文的文献

Colorectal Cancer Racial Equity Post Volume, Content, and Exposure: Observational Study Using Twitter Data.结直肠癌种族平等的推文数量、内容与曝光度：一项使用推特数据的观察性研究

J Med Internet Res. 2025 Feb 3;27:e63864. doi: 10.2196/63864.

Associations between news coverage, social media discussions, and search trends about celebrity deaths, screening, and other colorectal cancer-related events.名人死亡、筛查及其他结直肠癌相关事件的新闻报道、社交媒体讨论与搜索趋势之间的关联。

Prev Med. 2024 Aug;185:108022. doi: 10.1016/j.ypmed.2024.108022. Epub 2024 May 31.

Mapping automatic social media information disorder. The role of bots and AI in spreading misleading information in society.自动社交媒体信息混乱的映射。机器人和人工智能在社会传播误导性信息中的作用。

PLoS One. 2024 May 31;19(5):e0303183. doi: 10.1371/journal.pone.0303183. eCollection 2024.

本文引用的文献

The Effects of Tobacco Coverage in the Public Communication Environment on Young People's Decisions to Smoke Combustible Cigarettes.公共传播环境中的烟草覆盖对年轻人吸食可燃香烟决策的影响。

J Commun. 2022 Jan 13;72(2):187-213. doi: 10.1093/joc/jqab052. eCollection 2022 Apr.

A new infodemiological approach through Google Trends: longitudinal analysis of COVID-19 scientific and infodemic names in Italy.一种新的通过谷歌趋势进行的信息流行病学方法：意大利 COVID-19 科学和信息疫情名称的纵向分析。

BMC Med Res Methodol. 2022 Jan 30;22(1):33. doi: 10.1186/s12874-022-01523-x.

Medical Needs Extraction for Breast Cancer Patients from Question and Answer Services: Natural Language Processing-Based Approach.基于问答服务，采用自然语言处理方法提取乳腺癌患者的医疗需求

JMIR Cancer. 2021 Oct 28;7(4):e32005. doi: 10.2196/32005.

An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource.一种使用GloVe词嵌入和辅助词汇资源来丰富消费者健康词汇表的自动化方法。

PeerJ Comput Sci. 2021 Aug 9;7:e668. doi: 10.7717/peerj-cs.668. eCollection 2021.

Screening for Colorectal Cancer: US Preventive Services Task Force Recommendation Statement.结直肠癌筛查：美国预防服务工作组推荐声明。

JAMA. 2021 May 18;325(19):1965-1977. doi: 10.1001/jama.2021.6238.

BertMCN: Mapping colloquial phrases to standard medical concepts using BERT and highway network.BertMCN：使用 BERT 和高速公路网络将俗语映射到标准医学概念。

Artif Intell Med. 2021 Feb;112:102008. doi: 10.1016/j.artmed.2021.102008. Epub 2021 Jan 7.

The role of explanatory models of breast cancer in breast cancer prevention behaviors among Arab-Israeli physicians and laywomen.解释性模型在阿拉伯裔以色列医生和女性中的乳腺癌预防行为中的作用。

Prim Health Care Res Dev. 2020 Nov 3;21:e48. doi: 10.1017/S1463423620000237.

Misinformation About Commercial Tobacco Products on Social Media-Implications and Research Opportunities for Reducing Tobacco-Related Health Disparities.社交媒体上关于商业烟草产品的错误信息——减少烟草相关健康差距的影响及研究机会

Am J Public Health. 2020 Oct;110(S3):S281-S283. doi: 10.2105/AJPH.2020.305910.

Interventions to Reduce Healthcare Disparities in Cancer Screening Among Minority Adults: a Systematic Review.减少少数族裔成年人癌症筛查中医疗保健差异的干预措施：系统评价。

J Racial Ethn Health Disparities. 2021 Feb;8(1):107-126. doi: 10.1007/s40615-020-00763-1. Epub 2020 May 15.

Development of a Consumer Health Vocabulary by Mining Health Forum Texts Based on Word Embedding: Semiautomatic Approach.基于词嵌入挖掘健康论坛文本开发消费者健康词汇表：半自动方法

JMIR Med Inform. 2019 May 23;7(2):e12704. doi: 10.2196/12704.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

计算健康传播中的搜索词识别方法：YouTube上健康内容的词嵌入与网络方法

Search Term Identification Methods for Computational Health Communication: Word Embedding and Network Approach for Health Content on YouTube.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献