• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

电子健康知识发现系统输出的智能集成自动扩展语料库。

Automatic extension of corpora from the intelligent ensembling of eHealth knowledge discovery systems outputs.

机构信息

School of Math and Computer Science, University of Habana, La Habana 10200, Cuba.

University Institute for Computing Research (IUII), University of Alicante, Alicante 03690, Spain; Department of Language and Computing Systems, University of Alicante, Alicante 03690, Spain.

出版信息

J Biomed Inform. 2021 Apr;116:103716. doi: 10.1016/j.jbi.2021.103716. Epub 2021 Feb 26.

DOI:10.1016/j.jbi.2021.103716
PMID:33647519
Abstract

Corpora are one of the most valuable resources at present for building machine learning systems. However, building new corpora is an expensive task, which makes the automatic extension of corpora a highly attractive task to develop. Hence, finding new strategies that reduce the cost and effort involved in this task, while at the same time guaranteeing quality, remains an open and important challenge for the research community. In this paper, we present a set of ensembling strategies oriented toward entity and relation extraction tasks. The main goal is to combine several automatically annotated versions of corpora to produce a single version with improved quality. An ensembler is built by exploring a configuration space in search of the combination that maximizes the fitness of the ensembled collection according to a reference collection. The eHealth-KD 2019 challenge was chosen for the case study. The submitted systems' outputs were ensembled, resulting in the construction of an automatically annotated collection of 8000 sentences. We show that using this collection as additional training input for a baseline algorithm has a positive impact on its performance. Additionally, the ensembling pipeline was used as a participant system in the 2020 edition of the challenge. The ensembled run achieved a slightly better performance than the individual runs.

摘要

语料库是目前构建机器学习系统最有价值的资源之一。然而,构建新的语料库是一项昂贵的任务,这使得语料库的自动扩展成为一项极具吸引力的任务。因此,寻找新的策略来降低这项任务的成本和工作量,同时保证质量,仍然是研究界面临的一个开放和重要的挑战。

在本文中,我们提出了一组面向实体和关系抽取任务的集成策略。主要目标是通过组合多个自动标注的语料库版本,生成一个质量更高的单一版本。集成器通过探索配置空间来构建,以根据参考语料库来搜索最大化集成语料库适应性的组合。

eHealth-KD 2019 挑战赛被选为案例研究。提交的系统输出被集成,从而构建了一个 8000 个句子的自动标注语料库。我们表明,将该语料库作为基线算法的额外训练输入,对其性能有积极影响。此外,该集成管道还被用作该挑战赛 2020 年版的参赛系统。集成运行的性能略优于单个运行。

相似文献

1
Automatic extension of corpora from the intelligent ensembling of eHealth knowledge discovery systems outputs.电子健康知识发现系统输出的智能集成自动扩展语料库。
J Biomed Inform. 2021 Apr;116:103716. doi: 10.1016/j.jbi.2021.103716. Epub 2021 Feb 26.
2
A computational ecosystem to support eHealth Knowledge Discovery technologies in Spanish.一个支持西班牙语电子健康知识发现技术的计算生态系统。
J Biomed Inform. 2020 Sep;109:103517. doi: 10.1016/j.jbi.2020.103517. Epub 2020 Jul 24.
3
Drug knowledge discovery via multi-task learning and pre-trained models.通过多任务学习和预训练模型进行药物知识发现。
BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):251. doi: 10.1186/s12911-021-01614-7.
4
Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition.用于真实自然语言处理任务的合成临床笔记是否有用:以临床实体识别为例的研究
J Am Med Inform Assoc. 2021 Sep 18;28(10):2193-2201. doi: 10.1093/jamia/ocab112.
5
Portable automatic text classification for adverse drug reaction detection via multi-corpus training.通过多语料库训练实现用于药物不良反应检测的便携式自动文本分类
J Biomed Inform. 2015 Feb;53:196-207. doi: 10.1016/j.jbi.2014.11.002. Epub 2014 Nov 8.
6
Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records.基于生物医学语料库预训练的句子嵌入的深度学习提高了在电子病历中查找相似句子的性能。
BMC Med Inform Decis Mak. 2020 Apr 30;20(Suppl 1):73. doi: 10.1186/s12911-020-1044-0.
7
Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.从非结构化临床记录中提取症状的任务定义、标注数据集和监督自然语言处理模型。
J Biomed Inform. 2020 Feb;102:103354. doi: 10.1016/j.jbi.2019.103354. Epub 2019 Dec 12.
8
A corpus to support eHealth Knowledge Discovery technologies.一个支持电子健康知识发现技术的语料库。
J Biomed Inform. 2019 Jun;94:103172. doi: 10.1016/j.jbi.2019.103172. Epub 2019 Apr 6.
9
Exploiting and assessing multi-source data for supervised biomedical named entity recognition.利用和评估多源数据进行有监督的生物医学命名实体识别。
Bioinformatics. 2018 Jul 15;34(14):2474-2482. doi: 10.1093/bioinformatics/bty152.
10
Syntax-based transfer learning for the task of biomedical relation extraction.基于语法的迁移学习在生物医学关系抽取任务中的应用。
J Biomed Semantics. 2021 Aug 18;12(1):16. doi: 10.1186/s13326-021-00248-y.

引用本文的文献

1
Internet of medical things and blockchain-enabled patient-centric agent through SDN for remote patient monitoring in 5G network.基于 SDN 的面向患者的物联网和区块链代理,用于 5G 网络中的远程患者监测。
Sci Rep. 2024 Mar 4;14(1):5297. doi: 10.1038/s41598-024-55662-w.