Computer Science Department, Carlos III University of Madrid, Leganés, Spain.
J Biomed Inform. 2011 Oct;44(5):789-804. doi: 10.1016/j.jbi.2011.04.005. Epub 2011 Apr 24.
A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. Information Extraction (IE) techniques can provide health care professionals with an interesting way to reduce time spent reviewing the literature for potential drug-drug interactions. Nevertheless, no approach has been proposed to the problem of extracting DDIs in biomedical texts. In this article, we study whether a machine learning-based method is appropriate for DDI extraction in biomedical texts and whether the results provided are superior to those obtained from our previously proposed pattern-based approach. The method proposed here for DDI extraction is based on a supervised machine learning technique, more specifically, the shallow linguistic kernel proposed in Giuliano et al. (2006). Since no benchmark corpus was available to evaluate our approach to DDI extraction, we created the first such corpus, DrugDDI, annotated with 3169 DDIs. We performed several experiments varying the configuration parameters of the shallow linguistic kernel. The model that maximizes the F-measure was evaluated on the test data of the DrugDDI corpus, achieving a precision of 51.03%, a recall of 72.82% and an F-measure of 60.01%. To the best of our knowledge, this work has proposed the first full solution for the automatic extraction of DDIs from biomedical texts. Our study confirms that the shallow linguistic kernel outperforms our previous pattern-based approach. Additionally, it is our hope that the DrugDDI corpus will allow researchers to explore new solutions to the DDI extraction problem.
药物-药物相互作用(DDI)是指一种药物影响另一种药物的水平或活性。信息提取(IE)技术可以为医疗保健专业人员提供一种有趣的方法,以减少在文献中审查潜在药物-药物相互作用的时间。然而,目前还没有提出一种方法来解决生物医学文本中提取药物相互作用的问题。在本文中,我们研究了基于机器学习的方法是否适合生物医学文本中的药物相互作用提取,以及提供的结果是否优于我们之前提出的基于模式的方法。这里提出的用于 DDI 提取的方法是基于有监督的机器学习技术,更具体地说,是 Giuliano 等人提出的浅层语言核(2006 年)。由于没有基准语料库可用于评估我们的 DDI 提取方法,我们创建了第一个这样的语料库,DrugDDI,其中标注了 3169 个 DDI。我们通过改变浅层语言核的配置参数进行了几次实验。在 DrugDDI 语料库的测试数据上评估了最大化 F 度量的模型,获得了 51.03%的精度、72.82%的召回率和 60.01%的 F 度量。据我们所知,这项工作首次提出了从生物医学文本中自动提取 DDI 的完整解决方案。我们的研究证实,浅层语言核优于我们之前的基于模式的方法。此外,我们希望 DrugDDI 语料库将允许研究人员探索解决 DDI 提取问题的新方法。