Wang Anli, Li Linyi, Wu Xuehong, Zhu Jianping, Yu Shanshan, Chen Xi, Li Jianhua, Zhu Hongtao
Information Center, The Third Xiangya Hospital, Central South University, Changsha, China.
School of Computer Science, Central South University, Changsha, China.
Ann Transl Med. 2022 Oct;10(19):1061. doi: 10.21037/atm-22-3991.
Entity relation extraction is an important task in the construction of professional knowledge graphs in the medical field. Research on entity relation extraction for academic books in the medical field has revealed that there is a great difference in the number of different entity relations, which has led to the formation of a typical unbalanced data set that is difficult to recognize but has certain research value.
In this article, we propose a new entity relation extraction method based on data augmentation. According to the distribution of individual entity relation classes in the data set, the probability of whether a text is augmented during training was calculated. In text-oriented data augmentation, different augmentation methods perform differently in different language environments. The reinforcement of learning determines which data augmentation method to use in the current language environment. This strategy was applied to the entity relation extraction of the medical professional book, , and different data augmentation methods (i.e., no data augmentation, traditional data augmentation, and reinforcement learning-based data augmentation) were compared under the same neural network model.
The deep-learning model using data augmentation was better than the model without data augmentation, as data augmentation significantly improved the evaluation indicators of the relation classes with low data volumes in the unbalanced data set and slightly improved the evaluation indicators of the relation classes with sufficient features and large data volumes. Additionally, the deep-learning model using reinforcement learning-based data augmentation was superior to the deep-learning model using traditional data augmentation. We found that after the application of reinforcement learning-based data augmentation, the evaluation indicators of the multiple relation classes were much better than those to which reinforcement learning-based data augmentation had not been applied.
For unbalanced data sets, data augmentation can effectively improve the ability of the deep-learning model to obtain data features, and reinforcement learning-based data augmentation can further enhance this ability. Our experiments confirmed the superiority of reinforcement learning-based data augmentation.
实体关系抽取是医学领域专业知识图谱构建中的一项重要任务。对医学领域学术书籍的实体关系抽取研究表明,不同实体关系的数量存在很大差异,这导致形成了一个典型的不平衡数据集,该数据集难以识别,但具有一定的研究价值。
在本文中,我们提出了一种基于数据增强的新实体关系抽取方法。根据数据集中单个实体关系类别的分布,计算文本在训练期间是否增强的概率。在面向文本的数据增强中,不同的增强方法在不同的语言环境中表现不同。强化学习决定在当前语言环境中使用哪种数据增强方法。该策略应用于医学专业书籍的实体关系抽取,并在相同的神经网络模型下比较了不同的数据增强方法(即无数据增强、传统数据增强和基于强化学习的数据增强)。
使用数据增强的深度学习模型优于未使用数据增强的模型,因为数据增强显著提高了不平衡数据集中数据量较少的关系类别的评估指标,并略微提高了具有足够特征和大量数据的关系类别的评估指标。此外,使用基于强化学习的数据增强的深度学习模型优于使用传统数据增强的深度学习模型。我们发现,应用基于强化学习的数据增强后,多个关系类别的评估指标比未应用基于强化学习的数据增强的情况要好得多。
对于不平衡数据集,数据增强可以有效提高深度学习模型获取数据特征的能力,基于强化学习的数据增强可以进一步增强这种能力。我们的实验证实了基于强化学习的数据增强的优越性。