School of Information Science and Technology, Northwest University, Xi'an, 710127, China.
School of Information Science and Technology, Northwest University, Xi'an, 710127, China.
J Biomed Inform. 2022 Nov;135:104192. doi: 10.1016/j.jbi.2022.104192. Epub 2022 Sep 3.
The extraction of drug-drug interactions (DDIs) is an important task in the field of biomedical research, which can reduce unexpected health risks during patient treatment. Previous work indicates that methods using external drug information have a much higher performance than those methods not using it. However, the use of external drug information is time-consuming and resource-costly. In this work, we propose a novel method for extracting DDIs which does not use external drug information, but still achieves comparable performance. First, we no longer convert the drug name to standard tokens such as DRUG0, the method commonly used in previous research. Instead, full drug names with drug entity marking are input to BioBERT, allowing us to enhance the selected drug entity pair. Second, we adopt the Key Semantic Sentence approach to emphasize the words closely related to the DDI relation of the selected drug pair. After the above steps, the misclassification of similar instances which are created from the same sentence but corresponding to different pairs of drug entities can be significantly reduced. Then, we employ the Gradient Harmonizing Mechanism (GHM) loss to reduce the weight of mislabeled instances and easy-to-classify instances, both of which can lead to poor performance in DDI extraction. Overall, we demonstrate in this work that it is better not to use drug blinding with BioBERT, and show that GHM performs better than Cross-Entropy loss if the proportion of label noise is less than 30%. The proposed model achieves state-of-the-art results with an F1-score of 84.13% on the DDIExtraction 2013 corpus (a standard English DDI corpus), which fills the performance gap (4%) between methods that rely on and do not rely on external drug information.
药物-药物相互作用(DDI)的提取是生物医学研究领域的一项重要任务,它可以降低患者治疗过程中意外的健康风险。先前的工作表明,使用外部药物信息的方法比不使用它的方法性能更高。然而,使用外部药物信息既耗时又耗费资源。在这项工作中,我们提出了一种新的方法来提取 DDI,它不使用外部药物信息,但仍能达到相当的性能。首先,我们不再像以前的研究中那样将药物名称转换为标准的令牌,如 DRUG0。相反,我们将带有药物实体标记的完整药物名称输入到 BioBERT 中,从而增强了所选择的药物实体对。其次,我们采用关键语义句子方法来强调与所选药物对的 DDI 关系密切相关的单词。在上述步骤之后,可以显著减少由于来自同一句子但对应于不同药物实体对而产生的相似实例的错误分类。然后,我们采用梯度协调机制(GHM)损失来减少误标记实例和易于分类实例的权重,这两者都可能导致 DDI 提取性能不佳。总的来说,我们在这项工作中证明了在 BioBERT 中最好不要使用药物盲目性,并且如果标签噪声的比例小于 30%,则 GHM 比交叉熵损失表现更好。所提出的模型在 DDIExtraction 2013 语料库(一个标准的英语 DDI 语料库)上实现了 84.13%的 F1 得分,这填补了依赖和不依赖外部药物信息的方法之间 4%的性能差距。