Suppr超能文献

运用自然语言处理技术审视全加拿大药物安全研究项目的媒体报道的接受度、内容及可读性:横断面观察性研究

Using Natural Language Processing to Examine the Uptake, Content, and Readability of Media Coverage of a Pan-Canadian Drug Safety Research Project: Cross-Sectional Observational Study.

作者信息

Mohammadhassanzadeh Hossein, Sketris Ingrid, Traynor Robyn, Alexander Susan, Winquist Brandace, Stewart Samuel Alan

机构信息

Dalhousie University, Halifax, NS, Canada.

Nova Scotia Health Authority, Halifax, NS, Canada.

出版信息

JMIR Form Res. 2020 Jan 14;4(1):e13296. doi: 10.2196/13296.

Abstract

BACKGROUND

Isotretinoin, for treating cystic acne, increases the risk of miscarriage and fetal abnormalities when taken during pregnancy. The Health Canada-approved product monograph for isotretinoin includes pregnancy prevention guidelines. A recent study by the Canadian Network for Observational Drug Effect Studies (CNODES) on the occurrence of pregnancy and pregnancy outcomes during isotretinoin therapy estimated poor adherence to these guidelines. Media uptake of this study was unknown; awareness of this uptake could help improve drug safety communication.

OBJECTIVE

The aim of this study was to understand how the media present pharmacoepidemiological research using the CNODES isotretinoin study as a case study.

METHODS

Google News was searched (April 25-May 6, 2016), using a predefined set of terms, for mention of the CNODES study. In total, 26 articles and 3 CNODES publications (original article, press release, and podcast) were identified. The article texts were cleaned (eg, advertisements and links removed), and the podcast was transcribed. A dictionary of 1295 unique words was created using natural language processing (NLP) techniques (term frequency-inverse document frequency, Porter stemming, and stop-word filtering) to identify common words and phrases. Similarity between the articles and reference publications was calculated using Euclidian distance; articles were grouped using hierarchical agglomerative clustering. Nine readability scales were applied to measure text readability based on factors such as number of words, difficult words, syllables, sentence counts, and other textual metrics.

RESULTS

The top 5 dictionary words were pregnancy (250 appearances), isotretinoin (220), study (209), drug (201), and women (185). Three distinct clusters were identified: Clusters 2 (5 articles) and 3 (4 articles) were from health-related websites and media, respectively; Cluster 1 (18 articles) contained largely media sources; 2 articles fell outside these clusters. Use of the term isotretinoin versus Accutane (a brand name of isotretinoin), discussion of pregnancy complications, and assignment of responsibility for guideline adherence varied between clusters. For example, the term pregnanc appeared most often in Clusters 1 (14.6 average times per article) and 2 (11.4) and relatively infrequently in Cluster 3 (1.8). Average readability for all articles was high (eg, Flesch-Kincaid, 13; Gunning Fog, 15; SMOG Index, 10; Coleman Liau Index, 15; Linsear Write Index, 13; and Text Standard, 13). Readability increased from Cluster 2 (Gunning Fog of 16.9) to 3 (12.2). It varied between clusters (average 13th-15th grade) but exceeded the recommended health information reading level (grade 6th to 8th), overall.

CONCLUSIONS

Media interpretation of the CNODES study varied, with differences in synonym usage and areas of focus. All articles were written above the recommended health information reading level. Analyzing media using NLP techniques can help determine drug safety communication effectiveness. This project is important for understanding how drug safety studies are taken up and redistributed in the media.

摘要

背景

异维A酸用于治疗囊肿性痤疮,孕期服用会增加流产和胎儿异常风险。加拿大卫生部批准的异维A酸产品说明书包含妊娠预防指南。加拿大药物效应观察研究网络(CNODES)近期一项关于异维A酸治疗期间妊娠发生情况及妊娠结局的研究估计,这些指南的依从性较差。该研究在媒体上的传播情况未知;了解这种传播情况有助于改善药物安全信息沟通。

目的

本研究旨在以CNODES异维A酸研究为例,了解媒体如何呈现药物流行病学研究。

方法

于2016年4月25日至5月6日在谷歌新闻上使用一组预定义术语搜索提及CNODES研究的内容。共识别出26篇文章和3份CNODES出版物(原始文章、新闻稿和播客)。对文章文本进行清理(如去除广告和链接),并转录播客内容。使用自然语言处理(NLP)技术(词频 - 逆文档频率、波特词干提取和停用词过滤)创建了一个包含1295个独特单词的词典,以识别常见单词和短语。使用欧几里得距离计算文章与参考出版物之间的相似度;使用层次凝聚聚类对文章进行分组。应用九种可读性量表,根据单词数量、难词、音节、句子数量和其他文本指标等因素来衡量文本可读性。

结果

词典中出现次数最多的前5个单词是妊娠(出现250次)、异维A酸(220次)、研究(209次)、药物(201次)和女性(185次)。识别出三个不同的聚类:聚类2(5篇文章)和聚类3(4篇文章)分别来自健康相关网站和媒体;聚类1(18篇文章)主要包含媒体来源;2篇文章不属于这些聚类。不同聚类之间,异维A酸与异维甲酸(异维A酸的一个品牌名)一词的使用、妊娠并发症的讨论以及指南依从性责任的分配存在差异。例如,“pregnanc”一词在聚类1(每篇文章平均出现14.6次)和聚类2(11.4次)中出现频率最高,而在聚类3(1.8次)中相对较少。所有文章的平均可读性都很高(例如,弗莱什 - 金凯德可读性指数为13;冈宁雾度指数为15;烟雾指数为10;科尔曼 - 廖指数为15;林西尔写作指数为13;文本标准指数为13)。可读性从聚类2(冈宁雾度指数为16.9)到聚类3(12.2)有所增加。不同聚类之间有所不同(平均为13至15年级),但总体上超过了推荐的健康信息阅读水平(6至8年级)。

结论

媒体对CNODES研究的解读存在差异,在同义词使用和关注重点方面有所不同。所有文章的写作水平都高于推荐的健康信息阅读水平。使用NLP技术分析媒体有助于确定药物安全信息沟通的有效性。该项目对于理解药物安全研究在媒体中的传播和再传播方式非常重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9750/6996767/affe8a3c2099/formative_v4i1e13296_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验