Kavuluru Ramakanth, Rios Anthony, Tran Tung
Division of Biomedical Informatics, Dept. of Internal Medicine, University of Kentucky, Lexington, KY.
Department of Computer Science, University of Kentucky, Lexington, KY.
Proc (IEEE Int Conf Healthc Inform). 2017 Aug;2017:5-12. doi: 10.1109/ICHI.2017.15. Epub 2017 Sep 14.
Drug-drug interactions (DDIs) are known to be responsible for nearly a third of all adverse drug reactions. Hence several current efforts focus on extracting signal from EMRs to prioritize DDIs that need further exploration. To this end, being able to extract explicit mentions of DDIs in free text narratives is an important task. In this paper, we explore recurrent neural network (RNN) architectures to detect and classify DDIs from unstructured text using the DDIExtraction dataset from the SemEval 2013 (task 9) shared task. Our methods are in line with those used in other recent deep learning efforts for relation extraction including DDI extraction. However, to our knowledge, we are the first to investigate the potential of character-level RNNs (Char-RNNs) for DDI extraction (and relation extraction in general). Furthermore, we explore a simple but effective model bootstrapping method to (a). build model averaging ensembles, (b). derive confidence intervals around mean micro-F scores (MMF), and (c). assess the average behavior of our methods. Without any rule based filtering of negative examples, a popular heuristic used by most earlier efforts, we achieve an MMF of 69.13. By adding simple replicable heuristics to filter negative instances we are able to achieve an MMF of 70.38. Furthermore, our best ensembles produce micro F-scores of 70.81 (without filtering) and 72.13 (with filtering), which are superior to metrics reported in published results. Although Char-RNNs turnout to be inferior to regular word based RNN models in overall comparisons, we find that ensembling models from both architectures results in nontrivial gains over simply using either alone, indicating that they complement each other.
已知药物相互作用(DDIs)导致了近三分之一的药物不良反应。因此,目前的一些工作集中在从电子病历(EMRs)中提取信号,以确定需要进一步探究的DDIs的优先级。为此,能够从自由文本叙述中提取DDIs的明确提及是一项重要任务。在本文中,我们探索递归神经网络(RNN)架构,使用2013年语义评价(SemEval)(任务9)共享任务中的DDIExtraction数据集,从非结构化文本中检测和分类DDIs。我们的方法与最近其他用于关系提取(包括DDI提取)的深度学习工作中使用的方法一致。然而,据我们所知,我们是第一个研究字符级RNN(Char-RNNs)在DDI提取(以及一般关系提取)方面潜力的。此外,我们探索了一种简单但有效的模型自训练方法,以(a)构建模型平均集成,(b)得出平均微F分数(MMF)周围的置信区间,以及(c)评估我们方法的平均行为。在没有对负例进行任何基于规则的过滤的情况下(这是大多数早期工作使用的一种流行启发式方法),我们实现了69.13的MMF。通过添加简单可复制的启发式方法来过滤负实例,我们能够实现70.38的MMF。此外,我们最好的集成产生的微F分数分别为70.81(无过滤)和72.13(有过滤),优于已发表结果中报告的指标。尽管在总体比较中,Char-RNNs结果不如基于常规单词的RNN模型,但我们发现,将两种架构的模型进行集成比单独使用任何一种都能带来显著提升,这表明它们相互补充。