School of Computational Science & Engineering, Georgia Institute of Technology, Atlanta, USA; Office of Strategy and Innovation, National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, Atlanta, USA.
Office of Strategy and Innovation, National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, Atlanta, USA.
J Biomed Inform. 2021 Jul;119:103824. doi: 10.1016/j.jbi.2021.103824. Epub 2021 May 26.
Substances involved in overdose deaths have shifted over time and continue to undergo transition. Early detection of emerging drugs involved in overdose is a major challenge for traditional public health data systems. While novel social media data have shown promise, there is a continued need for robust natural language processing approaches that can identify emerging substances. Consequently, we developed a new metric, the relative similarity ratio, based on diachronic word embeddings to measure movement in the semantic proximity of individual substance words to 'overdose' over time. Our analysis of 64,420,376 drug-related posts made between January 2011 and December 2018 on Reddit, the largest online forum site, reveals that this approach successfully identified fentanyl, the most significant emerging substance in the overdose epidemic, >1 year earlier than traditional public health data systems. Use of diachronic word embeddings may enable improved identification of emerging substances involved in drug overdose, thereby improving the timeliness of prevention and treatment activities.
涉及过量用药死亡的物质随时间推移而发生变化,并在持续转变。早期发现与过量用药有关的新兴药物是传统公共卫生数据系统面临的一大挑战。尽管新型社交媒体数据显示出了前景,但仍需要强大的自然语言处理方法来识别新兴物质。因此,我们开发了一种新的度量标准,即相对相似比,基于历时词嵌入来衡量单个物质词与“过量用药”的语义接近度随时间的变化。我们对 2011 年 1 月至 2018 年 12 月期间在 Reddit(最大的在线论坛网站)上发布的 6442.036 万条与毒品相关的帖子进行了分析,结果表明,与传统公共卫生数据系统相比,这种方法能够提前 1 年以上成功识别出在过量用药流行中占比最大的新兴物质——芬太尼。历时词嵌入的使用可能会提高对涉及药物过量用药的新兴物质的识别能力,从而提高预防和治疗活动的及时性。