Suppr超能文献

利用远程监督来扩充人工标注数据以进行关系抽取。

Using distant supervision to augment manually annotated data for relation extraction.

机构信息

Department of Computer and Information Science, University of Delaware, Newark, Delaware, United States of America.

Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, United States of America.

出版信息

PLoS One. 2019 Jul 30;14(7):e0216913. doi: 10.1371/journal.pone.0216913. eCollection 2019.

Abstract

Significant progress has been made in applying deep learning on natural language processing tasks recently. However, deep learning models typically require a large amount of annotated training data while often only small labeled datasets are available for many natural language processing tasks in biomedical literature. Building large-size datasets for deep learning is expensive since it involves considerable human effort and usually requires domain expertise in specialized fields. In this work, we consider augmenting manually annotated data with large amounts of data using distant supervision. However, data obtained by distant supervision is often noisy, we first apply some heuristics to remove some of the incorrect annotations. Then using methods inspired from transfer learning, we show that the resulting models outperform models trained on the original manually annotated sets.

摘要

最近,在自然语言处理任务中应用深度学习已经取得了重大进展。然而,深度学习模型通常需要大量的标注训练数据,而对于生物医学文献中的许多自然语言处理任务,通常只有少量的标记数据集。由于构建大规模数据集需要大量的人力投入,并且通常需要在专门领域具备专业知识,因此为深度学习构建大规模数据集是昂贵的。在这项工作中,我们考虑使用远程监督来扩充手动标注数据。然而,远程监督获得的数据通常是嘈杂的,我们首先应用一些启发式方法来删除一些错误的标注。然后,我们使用受迁移学习启发的方法表明,所得到的模型优于在原始手动标注集上训练的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/876064556cf0/pone.0216913.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验