School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China.
Neural Netw. 2021 Jul;139:358-370. doi: 10.1016/j.neunet.2021.03.030. Epub 2021 Apr 1.
As a major method for relation extraction, distantly supervised relation extraction (DSRE) suffered from the noisy label problem and class imbalance problem (these two problems are also common for many other NLP tasks, e.g., text classification). However, there seems no existing research in DSRE or other NLP tasks that can simultaneously solve both problems, which is a significant insufficiency in related researches. In this paper, we propose a loss function which is robust to noisy label and efficient for the imbalanced class dataset. More specific, first we quantify the negative impacts of the noisy label and class imbalance problems. And then we construct a loss function that can minimize these negative impacts through a linear programming method. As far as we know, this seems to be the first attempt to address the noisy label problem and class imbalance problem simultaneously. We evaluated the constructed loss function on the distantly labeled dataset, our artificially noised dataset, human-annotated dataset of Docred, as well as the artificially noised dataset of CoNLL 2003. Experimental results indicate that a DNN model adopting the constructed loss function can outperform other models that adopt the state-of-the-art noisy label robust or negative sample robust loss functions.
作为关系抽取的主要方法,远程监督关系抽取(DSRE)受到噪声标签问题和类不平衡问题的困扰(这两个问题也存在于许多其他自然语言处理任务中,例如文本分类)。然而,在 DSRE 或其他自然语言处理任务中,似乎没有现有的研究能够同时解决这两个问题,这是相关研究中的一个显著不足。在本文中,我们提出了一种对噪声标签和不平衡类数据集都具有鲁棒性的损失函数。更具体地说,我们首先量化了噪声标签和类不平衡问题的负面影响。然后,我们通过线性规划方法构建了一个可以最小化这些负面影响的损失函数。据我们所知,这似乎是首次尝试同时解决噪声标签问题和类不平衡问题。我们在远程标记数据集、我们人为噪声数据集、Docred 的人工注释数据集以及 CoNLL 2003 的人为噪声数据集上评估了所构建的损失函数。实验结果表明,采用所构建的损失函数的 DNN 模型可以优于采用最新的噪声标签鲁棒或负样本鲁棒损失函数的其他模型。