文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

利用远程监督来扩充人工标注数据以进行关系抽取。

Using distant supervision to augment manually annotated data for relation extraction.

机构信息

Department of Computer and Information Science, University of Delaware, Newark, Delaware, United States of America.

Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, United States of America.

出版信息

PLoS One. 2019 Jul 30;14(7):e0216913. doi: 10.1371/journal.pone.0216913. eCollection 2019.


DOI:10.1371/journal.pone.0216913
PMID:31361753
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6667146/
Abstract

Significant progress has been made in applying deep learning on natural language processing tasks recently. However, deep learning models typically require a large amount of annotated training data while often only small labeled datasets are available for many natural language processing tasks in biomedical literature. Building large-size datasets for deep learning is expensive since it involves considerable human effort and usually requires domain expertise in specialized fields. In this work, we consider augmenting manually annotated data with large amounts of data using distant supervision. However, data obtained by distant supervision is often noisy, we first apply some heuristics to remove some of the incorrect annotations. Then using methods inspired from transfer learning, we show that the resulting models outperform models trained on the original manually annotated sets.

摘要

最近,在自然语言处理任务中应用深度学习已经取得了重大进展。然而,深度学习模型通常需要大量的标注训练数据,而对于生物医学文献中的许多自然语言处理任务,通常只有少量的标记数据集。由于构建大规模数据集需要大量的人力投入,并且通常需要在专门领域具备专业知识,因此为深度学习构建大规模数据集是昂贵的。在这项工作中,我们考虑使用远程监督来扩充手动标注数据。然而,远程监督获得的数据通常是嘈杂的,我们首先应用一些启发式方法来删除一些错误的标注。然后,我们使用受迁移学习启发的方法表明,所得到的模型优于在原始手动标注集上训练的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/92b8c54a5b43/pone.0216913.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/876064556cf0/pone.0216913.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/561058afa15a/pone.0216913.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/b68211b2bcf7/pone.0216913.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/d5e57c06c708/pone.0216913.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/36723d5df52e/pone.0216913.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/92b8c54a5b43/pone.0216913.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/876064556cf0/pone.0216913.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/561058afa15a/pone.0216913.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/b68211b2bcf7/pone.0216913.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/d5e57c06c708/pone.0216913.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/36723d5df52e/pone.0216913.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f4c/6667146/92b8c54a5b43/pone.0216913.g006.jpg

相似文献

[1]
Using distant supervision to augment manually annotated data for relation extraction.

PLoS One. 2019-7-30

[2]
Identification of asthma control factor in clinical notes using a hybrid deep learning model.

BMC Med Inform Decis Mak. 2021-11-9

[3]
A Combined Manual Annotation and Deep-Learning Natural Language Processing Study on Accurate Entity Extraction in Hereditary Disease Related Biomedical Literature.

Interdiscip Sci. 2024-6

[4]
Comparison of radiologist versus natural language processing-based image annotations for deep learning system for tuberculosis screening on chest radiographs.

Clin Imaging. 2022-7

[5]
Automatic de-identification of French electronic health records: a cost-effective approach exploiting distant supervision and deep learning models.

BMC Med Inform Decis Mak. 2024-2-16

[6]
Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.

Database (Oxford). 2018-1-1

[7]
Syntax-based transfer learning for the task of biomedical relation extraction.

J Biomed Semantics. 2021-8-18

[8]
Domain transformation on biological event extraction by learning methods.

J Biomed Inform. 2019-6-18

[9]
Facilitating information extraction without annotated data using unsupervised and positive-unlabeled learning.

AMIA Annu Symp Proc. 2020

[10]
Learning to explain is a good biomedical few-shot learner.

Bioinformatics. 2024-10-1

引用本文的文献

[1]
Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction.

BMC Bioinformatics. 2022-4-4

[2]
Identification of asthma control factor in clinical notes using a hybrid deep learning model.

BMC Med Inform Decis Mak. 2021-11-9

[3]
Deep Learning Identification of Asthma Inhaler Techniques in Clinical Notes.

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2020

[4]
UMLS-based data augmentation for natural language processing of clinical research literature.

J Am Med Inform Assoc. 2021-3-18

本文引用的文献

[1]
LocText: relation extraction of protein localizations to assist database curation.

BMC Bioinformatics. 2018-1-17

[2]
Extracting microRNA-gene relations from biomedical literature using distant supervision.

PLoS One. 2017-3-6

[3]
A Shortest Dependency Path Based Convolutional Neural Network for Protein-Protein Relation Extraction.

Biomed Res Int. 2016-7-14

[4]
GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains.

Biomed Res Int. 2015

[5]
Using distant supervised learning to identify protein subcellular localizations from full-text scientific articles.

J Biomed Inform. 2015-10

[6]
Deep learning.

Nature. 2015-5-28

[7]
tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles.

Database (Oxford). 2014-4-7

[8]
The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases.

Nucleic Acids Res. 2013-11-13

[9]
A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature.

PLoS Comput Biol. 2010-7-1

[10]
Overview of the protein-protein interaction annotation extraction task of BioCreative II.

Genome Biol. 2008

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索