IEEE Trans Nanobioscience. 2019 Jul;18(3):324-334. doi: 10.1109/TNB.2019.2909094. Epub 2019 Apr 4.
Serendipitous drug usage refers to the unexpected relief of comorbid diseases or symptoms when taking medication for a different known indication. Historically, serendipity has contributed significantly to identifying many new drug indications. If patient-reported serendipitous drug usage in social media could be computationally identified, it could help generate and validate drug-repositioning hypotheses. We investigated deep neural network models for mining serendipitous drug usage from social media. We used the word2vec algorithm to construct word-embedding features from drug reviews posted in a WebMD patient forum. We adapted and redesigned the convolutional neural network, long short-term memory network, and convolutional long short-term memory network by adding contextual information extracted from drug-review posts, information-filtering tools, medical ontology, and medical knowledge. We trained, tuned, and evaluated our models with a gold-standard dataset of 15714 sentences (447 [2.8%] describing serendipitous drug usage). Additionally, we compared our deep neural networks to support vector machine, random forest, and AdaBoost.M1 algorithms. Context information helped to reduce the false-positive rate of deep neural network models. If we used an extremely imbalanced dataset with limited instances of serendipitous drug usage, deep neural network models did not outperform other machine-learning models with n-gram and context features. However, deep neural network models could more effectively use word embedding in feature construction, an advantage that makes them worthy of further investigation. Finally, we implemented natural-language processing and machine-learning methods in a web-based application to help scientists and software developers mine social media for serendipitous drug usage.
偶然药物使用是指在使用药物治疗已知的其他适应症时,意外缓解了共病或症状。历史上,偶然发现对确定许多新的药物适应症有很大的贡献。如果可以通过计算从社交媒体中识别出患者报告的偶然药物使用情况,那么这可能有助于生成和验证药物重新定位假说。我们研究了用于从社交媒体中挖掘偶然药物使用情况的深度神经网络模型。我们使用 word2vec 算法从 WebMD 患者论坛中发布的药物评论中构建词嵌入特征。我们通过添加从药物评论帖子中提取的上下文信息、信息过滤工具、医学本体和医学知识,对卷积神经网络、长短期记忆网络和卷积长短期记忆网络进行了改编和重新设计。我们使用包含 15714 条句子的黄金标准数据集(447 [2.8%]条描述偶然药物使用情况)对我们的模型进行了训练、调整和评估。此外,我们还将我们的深度神经网络与支持向量机、随机森林和 AdaBoost.M1 算法进行了比较。上下文信息有助于降低深度神经网络模型的假阳性率。如果我们使用的是具有有限偶然药物使用实例的严重不平衡数据集,那么深度神经网络模型并不优于具有 n 元组和上下文特征的其他机器学习模型。但是,深度神经网络模型可以更有效地在特征构建中使用词嵌入,这一优势使其值得进一步研究。最后,我们在一个基于 Web 的应用程序中实现了自然语言处理和机器学习方法,以帮助科学家和软件开发人员从社交媒体中挖掘偶然药物使用情况。