Jiang Keyuan, Chen Tingyu, Huang Liyuan, Calix Ricardo A, Bernard Gordon R
Department of Computer Information Technology and Graphics, Purdue University Northwest, U.S.A.
Department of Medicine, Vanderbilt University, U.S.A.
Stud Health Technol Inform. 2018;247:136-140.
Twitter, as a microblogging social media platform, has seen increasing applications of its data for pharmacovigilance which is to monitor and promote safe uses of pharmaceutical products. Medication names are typically used as keywords to query social media data. It is known that medication names are misspelled on social media, and finding the misspellings is challenging because there exists no a priori knowledge as to how people would misspell a medication name. We developed a data-driven, relational similarity-based approach to discover misspellings of medication names. Our approach is based upon the assumption of the identical (or similar) association of a medicine with its effects whether the medication is correctly spelled or misspelled. With distributed representations of the words in tweets posted in recent 24 months, we were able to discover a total of 54 misspellings of 6 medicines whose indications containing headache. Our search results also show that Twitter posts with misspellings of codeine and ibuprofen can be more than 10% of all the tweets associated with each of the medicines. Compared with the phonetics-based approach, our method discovered more actual misspellings used on Twitter.
推特作为一个微博社交媒体平台,其数据在药物警戒(即监测和促进药品安全使用)方面的应用越来越多。药品名称通常被用作查询社交媒体数据的关键词。众所周知,社交媒体上存在药品名称拼写错误的情况,而找出这些拼写错误具有挑战性,因为对于人们会如何拼写错误药品名称没有先验知识。我们开发了一种基于数据驱动、关系相似性的方法来发现药品名称的拼写错误。我们的方法基于这样一种假设:无论药品名称拼写正确与否,药物与其效果的关联是相同(或相似)的。利用最近24个月发布的推文中单词的分布式表示,我们总共发现了6种适应症包含头痛的药品的54种拼写错误。我们的搜索结果还表明,含有可待因和布洛芬拼写错误的推特帖子可能占与每种药品相关的所有推文的10%以上。与基于语音的方法相比,我们的方法发现了推特上更多实际使用的拼写错误。