Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France.
Université Paris-Saclay, CNRS, LIMSI, Orsay, France.
BMC Bioinformatics. 2020 Dec 29;21(Suppl 23):579. doi: 10.1186/s12859-020-03886-8.
Entity normalization is an important information extraction task which has gained renewed attention in the last decade, particularly in the biomedical and life science domains. In these domains, and more generally in all specialized domains, this task is still challenging for the latest machine learning-based approaches, which have difficulty handling highly multi-class and few-shot learning problems. To address this issue, we propose C-Norm, a new neural approach which synergistically combines standard and weak supervision, ontological knowledge integration and distributional semantics.
Our approach greatly outperforms all methods evaluated on the Bacteria Biotope datasets of BioNLP Open Shared Tasks 2019, without integrating any manually-designed domain-specific rules.
Our results show that relatively shallow neural network methods can perform well in domains that present highly multi-class and few-shot learning problems.
实体规范化是一项重要的信息提取任务,在过去十年中受到了新的关注,特别是在生物医学和生命科学领域。在这些领域,以及更普遍的所有专业领域,这个任务对于基于最新机器学习的方法来说仍然具有挑战性,因为它们难以处理高度多类和少数样本学习问题。为了解决这个问题,我们提出了 C-Norm,这是一种新的神经方法,它协同结合了标准和弱监督、本体知识集成和分布语义。
我们的方法在 2019 年生物自然语言处理开放共享任务的细菌生境数据集上的所有评估方法中表现出色,没有集成任何手动设计的领域特定规则。
我们的结果表明,相对较浅的神经网络方法可以在呈现高度多类和少数样本学习问题的领域中表现良好。