• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于子序列和远程监督的中文医学文本关系抽取的主动学习。

Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts.

机构信息

School of Information Science and Technology, East China University of Science and Technology, Shanghai, 200237, China.

出版信息

BMC Med Inform Decis Mak. 2023 Feb 14;23(1):34. doi: 10.1186/s12911-023-02127-1.

DOI:10.1186/s12911-023-02127-1
PMID:36788504
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9926422/
Abstract

In recent years, relation extraction on unstructured texts has become an important task in medical research. However, relation extraction requires a large amount of labeled corpus, manually annotating sequences is time consuming and expensive. Therefore, efficient and economical methods for annotating sequences are required to ensure the performance of relational extraction. This paper proposes a method of subsequence and distant supervision based active learning. The method is annotated by selecting information-rich subsequences as a sampling unit instead of the full sentences in traditional active learning. Additionally, the method saves the labeled subsequence texts and their corresponding labels in a dictionary which is continuously updated and maintained, and pre-labels the unlabeled set through text matching based on the idea of distant supervision. Finally, the method combines a Chinese-RoBERTa-CRF model for relation extraction in Chinese medical texts. Experimental results test on the CMeIE dataset achieves the best performance compared to existing methods. And the best F1 value obtained between different sampling strategies is 55.96%.

摘要

近年来,非结构化文本上的关系抽取已成为医学研究中的一项重要任务。然而,关系抽取需要大量带标签的语料库,手动标注序列既耗时又昂贵。因此,需要高效且经济的序列标注方法来确保关系抽取的性能。本文提出了一种基于子序列和远程监督的主动学习方法。该方法通过选择信息丰富的子序列作为采样单元,而不是传统主动学习中的完整句子来进行标注。此外,该方法将标记的子序列文本及其对应的标签存储在字典中,并不断更新和维护,然后基于远程监督的思想通过文本匹配对未标记的集合进行预标记。最后,该方法结合了一种基于中文 RoBERTa-CRF 的模型,用于中文医学文本的关系抽取。在 CMeIE 数据集上的实验结果表明,该方法相较于现有方法取得了最佳性能。并且,在不同的采样策略之间获得的最佳 F1 值为 55.96%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/244e/9926645/6099b0375ea8/12911_2023_2127_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/244e/9926645/a49d390159a6/12911_2023_2127_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/244e/9926645/70de9dd52628/12911_2023_2127_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/244e/9926645/17de3127d64f/12911_2023_2127_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/244e/9926645/af43ce0c97fb/12911_2023_2127_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/244e/9926645/6099b0375ea8/12911_2023_2127_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/244e/9926645/a49d390159a6/12911_2023_2127_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/244e/9926645/70de9dd52628/12911_2023_2127_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/244e/9926645/17de3127d64f/12911_2023_2127_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/244e/9926645/af43ce0c97fb/12911_2023_2127_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/244e/9926645/6099b0375ea8/12911_2023_2127_Fig5_HTML.jpg

相似文献

1
Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts.基于子序列和远程监督的中文医学文本关系抽取的主动学习。
BMC Med Inform Decis Mak. 2023 Feb 14;23(1):34. doi: 10.1186/s12911-023-02127-1.
2
A hybrid method based on semi-supervised learning for relation extraction in Chinese EMRs.基于半监督学习的中文电子病历关系抽取混合方法。
BMC Med Inform Decis Mak. 2022 Jun 27;22(1):169. doi: 10.1186/s12911-022-01908-4.
3
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。
BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.
4
FGSI: distant supervision for relation extraction method based on fine-grained semantic information.FGSI:基于细粒度语义信息的关系抽取远程监督方法
Sci Rep. 2023 Aug 28;13(1):14075. doi: 10.1038/s41598-023-41354-4.
5
Extracting PICO Sentences from Clinical Trial Reports using .使用……从临床试验报告中提取PICO句子
J Mach Learn Res. 2016;17.
6
Distant Supervision Relation Extraction via adaptive dependency-path and additional knowledge graph supervision.基于自适应依赖路径和额外知识图监督的远距离关系抽取。
Neural Netw. 2021 Feb;134:42-53. doi: 10.1016/j.neunet.2020.10.012. Epub 2020 Nov 21.
7
Application of cascade binary pointer tagging in joint entity and relation extraction of Chinese medical text.级联二值指针标注在中文医学文本联合实体和关系抽取中的应用。
Math Biosci Eng. 2022 Jul 27;19(10):10656-10672. doi: 10.3934/mbe.2022498.
8
Distant Supervision with Transductive Learning for Adverse Drug Reaction Identification from Electronic Medical Records.基于转导学习的电子病历中药物不良反应识别的远程监督
J Healthc Eng. 2017;2017:7575280. doi: 10.1155/2017/7575280. Epub 2017 Sep 26.
9
Joint extraction of Chinese medical entities and relations based on RoBERTa and single-module global pointer.基于RoBERTa和单模块全局指针的中医实体与关系联合提取
BMC Med Inform Decis Mak. 2024 Jul 31;24(1):218. doi: 10.1186/s12911-024-02577-1.
10
A Generic Semi-Supervised and Active Learning Framework for Biomedical Text Classification.一种用于生物医学文本分类的通用半监督和主动学习框架。
Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:4445-4448. doi: 10.1109/EMBC48229.2022.9871846.

本文引用的文献

1
Application of cascade binary pointer tagging in joint entity and relation extraction of Chinese medical text.级联二值指针标注在中文医学文本联合实体和关系抽取中的应用。
Math Biosci Eng. 2022 Jul 27;19(10):10656-10672. doi: 10.3934/mbe.2022498.
2
A hybrid method based on semi-supervised learning for relation extraction in Chinese EMRs.基于半监督学习的中文电子病历关系抽取混合方法。
BMC Med Inform Decis Mak. 2022 Jun 27;22(1):169. doi: 10.1186/s12911-022-01908-4.
3
Deep learning methods for biomedical named entity recognition: a survey and qualitative comparison.
深度学习方法在生物医学命名实体识别中的应用:综述与定性比较。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab282.
4
A span-graph neural model for overlapping entity relation extraction in biomedical texts.一种用于生物医学文献中重叠实体关系抽取的图神经网络模型。
Bioinformatics. 2021 Jul 12;37(11):1581-1589. doi: 10.1093/bioinformatics/btaa993.