Jeon Sung Hwan, Cho Sungzoon
Department of Industrial Engineering, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, Republic of Korea.
Institute for Industrial Systems Innovation, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, Republic of Korea.
Neural Process Lett. 2022 Dec 21:1-22. doi: 10.1007/s11063-022-11102-2.
Discriminating the matched named entity pairs or identifying the entities' canonical forms are critical in text mining tasks. More precise named entity normalization in text mining will benefit other subsequent text analytic applications. We built the named entity normalization model with a novel edge weight updating neural network. We, next, verify our model's performance on NCBI disease, BC5CDR disease, and BC5CDR chemical databases, which are widely used named entity normalization datasets in the bioinformatics field. We also tested our model with our own financial named entity normalization dataset to validate the efficacy for more general applications. Using the constructed dataset, we differentiate named entity pairs. Our model achieved the highest named entity normalization performances in terms of various evaluation metrics. Our proposed model when tested on four different datasets achieved state-of-the-art results.
区分匹配的命名实体对或识别实体的规范形式在文本挖掘任务中至关重要。文本挖掘中更精确的命名实体规范化将有利于其他后续的文本分析应用程序。我们使用一种新颖的边权重更新神经网络构建了命名实体规范化模型。接下来,我们在NCBI疾病、BC5CDR疾病和BC5CDR化学数据库上验证我们模型的性能,这些数据库是生物信息学领域广泛使用的命名实体规范化数据集。我们还使用自己的金融命名实体规范化数据集测试了我们的模型,以验证其在更通用应用中的有效性。使用构建的数据集,我们区分命名实体对。我们的模型在各种评估指标方面取得了最高的命名实体规范化性能。我们提出的模型在四个不同数据集上进行测试时取得了最优结果。