IEEE J Biomed Health Inform. 2020 Nov;24(11):3315-3325. doi: 10.1109/JBHI.2020.2983365. Epub 2020 Nov 4.
Understanding the chemical-disease relations (CDR) is a crucial task in various biomedical domains. Manual mining of these information from biomedical literature is costly and time-consuming. To address these issues, various researches have been carried out to design an efficient automatic tool. In this paper, we propose a multi-view based deep neural network model for CDR task. Typically, multiple representations (or views) of the datasets are not available for this task. So, we train multiple conceptually different deep neural network models on the dataset to generate different abstract features, treated as different views. A novel loss function, "Penalized LF", is defined to address the problem of imbalance dataset. The proposed loss function is generic in nature. The model is designed as a combination of Convolution Neural Network (CNN) and Bidirectional Long Short Term Memory (Bi-LSTM) network along with a Multi-Layer Perceptron (MLP). To show the efficacy of our proposed model, we have compared it with six baseline models and other state-of-the-art techniques, on "chemicals-and-disease-DFE" dataset, a free text dataset created by Li et al. from BioCreative V Chemical Disease Relation dataset. Results show that the proposed model attains highest F1-score for individual classes, proving its efficiency in handling class imbalance problem in the dataset. To further demonstrate the efficacy of the proposed model, we have presented results on BioCreative V dataset and two Protein-Protein Interaction Identification (PPI) datasets, viz., AiMed and BioInfer. All these results are also compared with the state-of-the-art models.
理解化学-疾病关系(CDR)是各个生物医学领域的一项关键任务。从生物医学文献中手动挖掘这些信息既昂贵又耗时。为了解决这些问题,已经进行了各种研究来设计有效的自动工具。在本文中,我们提出了一种基于多视图的深度神经网络模型来进行 CDR 任务。通常,对于这个任务,数据集没有多个表示(或视图)。因此,我们在数据集上训练多个概念上不同的深度神经网络模型,以生成不同的抽象特征,视为不同的视图。我们定义了一个新的损失函数“惩罚 LF”来解决数据集不平衡的问题。该提出的损失函数具有通用性。该模型设计为卷积神经网络(CNN)和双向长短期记忆(Bi-LSTM)网络与多层感知器(MLP)的组合。为了展示我们提出的模型的有效性,我们将其与六个基线模型和其他最先进的技术在“chemicals-and-disease-DFE”数据集上进行了比较,该数据集是由 Li 等人从 BioCreative V Chemical Disease Relation 数据集创建的自由文本数据集。结果表明,所提出的模型在各个类别的 F1 得分最高,证明了其在处理数据集中类不平衡问题的效率。为了进一步证明所提出的模型的有效性,我们还在 BioCreative V 数据集和两个蛋白质-蛋白质相互作用识别(PPI)数据集 AiMed 和 BioInfer 上展示了结果。所有这些结果也与最先进的模型进行了比较。