• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物医学领域的语义角色标注的领域自适应。

Domain adaptation for semantic role labeling in the biomedical domain.

机构信息

NUS Graduate School for Integrative Sciences and Engineering, Singapore 117456, Singapore.

出版信息

Bioinformatics. 2010 Apr 15;26(8):1098-104. doi: 10.1093/bioinformatics/btq075. Epub 2010 Feb 23.

DOI:10.1093/bioinformatics/btq075
PMID:20179074
Abstract

MOTIVATION

Semantic role labeling (SRL) is a natural language processing (NLP) task that extracts a shallow meaning representation from free text sentences. Several efforts to create SRL systems for the biomedical domain have been made during the last few years. However, state-of-the-art SRL relies on manually annotated training instances, which are rare and expensive to prepare. In this article, we address SRL for the biomedical domain as a domain adaptation problem to leverage existing SRL resources from the newswire domain.

RESULTS

We evaluate the performance of three recently proposed domain adaptation algorithms for SRL. Our results show that by using domain adaptation, the cost of developing an SRL system for the biomedical domain can be reduced significantly. Using domain adaptation, our system can achieve 97% of the performance with as little as 60 annotated target domain abstracts.

AVAILABILITY

Our BioKIT system that performs SRL in the biomedical domain as described in this article is implemented in Python and C and operates under the Linux operating system. BioKIT can be downloaded at http://nlp.comp.nus.edu.sg/software. The domain adaptation software is available for download at http://www.mysmu.edu/faculty/jingjiang/software/DALR.html. The BioProp corpus is available from the Linguistic Data Consortium http://www.ldc.upenn.edu.

摘要

动机

语义角色标注(SRL)是一种自然语言处理(NLP)任务,它从自由文本句子中提取出浅层的语义表示。在过去的几年中,已经有几项针对生物医学领域的 SRL 系统的创建工作。然而,最新的 SRL 依赖于手动标注的训练实例,这些实例很少且准备起来很昂贵。在本文中,我们将生物医学领域的 SRL 视为一种领域自适应问题,以利用来自新闻领域的现有 SRL 资源。

结果

我们评估了三种最近提出的用于 SRL 的领域自适应算法的性能。我们的结果表明,通过使用领域自适应,可以显著降低开发生物医学领域 SRL 系统的成本。通过使用领域自适应,我们的系统仅使用 60 个标注的目标域摘要就可以达到 97%的性能。

可用性

我们的 BioKIT 系统在生物医学领域执行 SRL,如本文所述,它是用 Python 和 C 实现的,并在 Linux 操作系统下运行。BioKIT 可以从以下网址下载:http://nlp.comp.nus.edu.sg/software。领域自适应软件可从以下网址下载:http://www.mysmu.edu/faculty/jingjiang/software/DALR.html。BioProp 语料库可从语言数据联盟获取,网址为:http://www.ldc.upenn.edu。

相似文献

1
Domain adaptation for semantic role labeling in the biomedical domain.生物医学领域的语义角色标注的领域自适应。
Bioinformatics. 2010 Apr 15;26(8):1098-104. doi: 10.1093/bioinformatics/btq075. Epub 2010 Feb 23.
2
Domain adaptation for semantic role labeling of clinical text.临床文本语义角色标注的领域适应
J Am Med Inform Assoc. 2015 Sep;22(5):967-79. doi: 10.1093/jamia/ocu048. Epub 2015 Jun 10.
3
BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features.BIOSMILE:一种用于生物医学动词的语义角色标注系统,它使用带有自动生成模板特征的最大熵模型。
BMC Bioinformatics. 2007 Sep 1;8:325. doi: 10.1186/1471-2105-8-325.
4
The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text.自然语言处理中领域知识与语言结构的相互作用:解读生物医学文本中的上位命题
J Biomed Inform. 2003 Dec;36(6):462-77. doi: 10.1016/j.jbi.2003.11.003.
5
Recognizing names in biomedical texts: a machine learning approach.识别生物医学文本中的名称:一种机器学习方法。
Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.
6
Automatic term list generation for entity tagging.用于实体标记的自动术语列表生成。
Bioinformatics. 2006 Mar 15;22(6):651-7. doi: 10.1093/bioinformatics/bti733. Epub 2005 Oct 25.
7
Developing a corpus of clinical notes manually annotated for part-of-speech.开发一个词性人工标注的临床笔记语料库。
Int J Med Inform. 2006 Jun;75(6):418-29. doi: 10.1016/j.ijmedinf.2005.08.006. Epub 2005 Sep 19.
8
Gene symbol disambiguation using knowledge-based profiles.使用基于知识的概况进行基因符号消歧。
Bioinformatics. 2007 Apr 15;23(8):1015-22. doi: 10.1093/bioinformatics/btm056. Epub 2007 Feb 21.
9
Semi-automatic conversion of BioProp semantic annotation to PASBio annotation.将生物属性语义注释半自动转换为PASBio注释。
BMC Bioinformatics. 2008 Dec 12;9 Suppl 12(Suppl 12):S18. doi: 10.1186/1471-2105-9-S12-S18.
10
A hybrid method for relation extraction from biomedical literature.一种从生物医学文献中提取关系的混合方法。
Int J Med Inform. 2006 Jun;75(6):443-55. doi: 10.1016/j.ijmedinf.2005.06.010. Epub 2005 Aug 10.

引用本文的文献

1
A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora.基于多语料库的临床试验资格标准中命名实体识别的预训练语言模型的比较研究。
BMC Med Inform Decis Mak. 2022 Sep 6;22(Suppl 3):235. doi: 10.1186/s12911-022-01967-7.
2
ProvCaRe: Characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata.ProvCaRe:使用语义来源元数据刻画生物医学研究的科学可重复性。
Int J Med Inform. 2019 Jan;121:10-18. doi: 10.1016/j.ijmedinf.2018.10.009. Epub 2018 Nov 3.
3
Adapting Word Embeddings from Multiple Domains to Symptom Recognition from Psychiatric Notes.
将多领域词嵌入应用于精神科病历症状识别
AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:281-289. eCollection 2018.
4
Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.利用现有语料库,通过领域自适应对精神科病历进行去识别化处理。
AMIA Annu Symp Proc. 2018 Apr 16;2017:1070-1079. eCollection 2017.
5
Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach.对医学术语进行排序以支持扩展用于患者理解电子健康记录笔记的通俗语言资源:适应性远程监督方法。
JMIR Med Inform. 2017 Oct 31;5(4):e42. doi: 10.2196/medinform.8531.
6
Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.科罗拉多生物医学期刊文章丰富注释全文(CRAFT)语料库中的共指标注与消解
BMC Bioinformatics. 2017 Aug 17;18(1):372. doi: 10.1186/s12859-017-1775-9.
7
Knowledge-transfer learning for prediction of matrix metalloprotease substrate-cleavage sites.基于知识转移学习的基质金属蛋白酶底物切割位点预测。
Sci Rep. 2017 Jul 18;7(1):5755. doi: 10.1038/s41598-017-06219-7.
8
Semantic Role Labeling of Clinical Text: Comparing Syntactic Parsers and Features.临床文本的语义角色标注:句法分析器与特征比较
AMIA Annu Symp Proc. 2017 Feb 10;2016:1283-1292. eCollection 2016.
9
Extractive text summarization system to aid data extraction from full text in systematic review development.用于从系统综述开发的全文中辅助数据提取的抽取式文本摘要系统。
J Biomed Inform. 2016 Dec;64:265-272. doi: 10.1016/j.jbi.2016.10.014. Epub 2016 Oct 27.
10
BelSmile: a biomedical semantic role labeling approach for extracting biological expression language from text.BelSmile:一种用于从文本中提取生物表达语言的生物医学语义角色标注方法。
Database (Oxford). 2016 May 12;2016. doi: 10.1093/database/baw064. Print 2016.