• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质转运谓词的语义角色标注。

Semantic role labeling for protein transport predicates.

作者信息

Bethard Steven, Lu Zhiyong, Martin James H, Hunter Lawrence

机构信息

Computer Science Department, University of Colorado at Boulder, Boulder, CO, USA.

出版信息

BMC Bioinformatics. 2008 Jun 11;9:277. doi: 10.1186/1471-2105-9-277.

DOI:10.1186/1471-2105-9-277
PMID:18547432
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2474622/
Abstract

BACKGROUND

Automatic semantic role labeling (SRL) is a natural language processing (NLP) technique that maps sentences to semantic representations. This technique has been widely studied in the recent years, but mostly with data in newswire domains. Here, we report on a SRL model for identifying the semantic roles of biomedical predicates describing protein transport in GeneRIFs - manually curated sentences focusing on gene functions. To avoid the computational cost of syntactic parsing, and because the boundaries of our protein transport roles often did not match up with syntactic phrase boundaries, we approached this problem with a word-chunking paradigm and trained support vector machine classifiers to classify words as being at the beginning, inside or outside of a protein transport role.

RESULTS

We collected a set of 837 GeneRIFs describing movements of proteins between cellular components, whose predicates were annotated for the semantic roles AGENT, PATIENT, ORIGIN and DESTINATION. We trained these models with the features of previous word-chunking models, features adapted from phrase-chunking models, and features derived from an analysis of our data. Our models were able to label protein transport semantic roles with 87.6% precision and 79.0% recall when using manually annotated protein boundaries, and 87.0% precision and 74.5% recall when using automatically identified ones.

CONCLUSION

We successfully adapted the word-chunking classification paradigm to semantic role labeling, applying it to a new domain with predicates completely absent from any previous studies. By combining the traditional word and phrasal role labeling features with biomedical features like protein boundaries and MEDPOST part of speech tags, we were able to address the challenges posed by the new domain data and subsequently build robust models that achieved F-measures as high as 83.1. This system for extracting protein transport information from GeneRIFs performs well even with proteins identified automatically, and is therefore more robust than the rule-based methods previously used to extract protein transport roles.

摘要

背景

自动语义角色标注(SRL)是一种自然语言处理(NLP)技术,可将句子映射为语义表示。近年来,这项技术得到了广泛研究,但大多是针对新闻领域的数据。在此,我们报告一种SRL模型,用于识别基因功能注释(GeneRIFs)中描述蛋白质转运的生物医学谓词的语义角色,GeneRIFs是专注于基因功能的人工整理句子。为避免句法剖析的计算成本,且由于我们的蛋白质转运角色边界通常与句法短语边界不匹配,我们采用词块划分范式处理此问题,并训练支持向量机分类器将单词分类为处于蛋白质转运角色的开头、内部或外部。

结果

我们收集了一组837个描述蛋白质在细胞成分之间移动的GeneRIFs,其谓词被标注了施事(AGENT)、受事(PATIENT)、来源(ORIGIN)和目的地(DESTINATION)等语义角色。我们使用先前词块划分模型的特征、从短语块划分模型改编的特征以及对我们的数据进行分析得出的特征来训练这些模型。当使用手动注释的蛋白质边界时,我们的模型能够以87.6%的精确率和79.0%的召回率标记蛋白质转运语义角色;当使用自动识别的边界时,精确率为87.0%,召回率为74.5%。

结论

我们成功地将词块划分分类范式应用于语义角色标注,将其应用于一个之前任何研究中都完全没有谓词的新领域。通过将传统的单词和短语角色标注特征与诸如蛋白质边界和MEDPOST词性标签等生物医学特征相结合,我们能够应对新领域数据带来的挑战,并随后构建出F值高达83.1的强大模型。这个从GeneRIFs中提取蛋白质转运信息的系统即使在自动识别蛋白质的情况下也表现良好,因此比以前用于提取蛋白质转运角色的基于规则的方法更强大。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f18/2474622/b029676e9b36/1471-2105-9-277-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f18/2474622/b029676e9b36/1471-2105-9-277-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f18/2474622/b029676e9b36/1471-2105-9-277-1.jpg

相似文献

1
Semantic role labeling for protein transport predicates.蛋白质转运谓词的语义角色标注。
BMC Bioinformatics. 2008 Jun 11;9:277. doi: 10.1186/1471-2105-9-277.
2
BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features.BIOSMILE:一种用于生物医学动词的语义角色标注系统,它使用带有自动生成模板特征的最大熵模型。
BMC Bioinformatics. 2007 Sep 1;8:325. doi: 10.1186/1471-2105-8-325.
3
Semi-automatic conversion of BioProp semantic annotation to PASBio annotation.将生物属性语义注释半自动转换为PASBio注释。
BMC Bioinformatics. 2008 Dec 12;9 Suppl 12(Suppl 12):S18. doi: 10.1186/1471-2105-9-S12-S18.
4
Inferring the semantic relationships of words within an ontology using random indexing: applications to pharmacogenomics.使用随机索引推断本体中词汇的语义关系:在药物基因组学中的应用
AMIA Annu Symp Proc. 2013 Nov 16;2013:1123-32. eCollection 2013.
5
Automatic identification and classification of noun argument structures in biomedical literature.生物医学文献中名词论元结构的自动识别与分类。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1639-48. doi: 10.1109/TCBB.2012.111.
6
Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.自动将全文生物医学文章中的句子分类为引言、方法、结果和讨论。
Bioinformatics. 2009 Dec 1;25(23):3174-80. doi: 10.1093/bioinformatics/btp548. Epub 2009 Sep 25.
7
Use of semantic features to classify patient smoking status.利用语义特征对患者吸烟状况进行分类。
AMIA Annu Symp Proc. 2008 Nov 6;2008:450-4.
8
Automatic classification of sentences to support Evidence Based Medicine.支持循证医学的句子自动分类。
BMC Bioinformatics. 2011 Mar 29;12 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-12-S2-S5.
9
Gene clustering by latent semantic indexing of MEDLINE abstracts.通过MEDLINE摘要的潜在语义索引进行基因聚类。
Bioinformatics. 2005 Jan 1;21(1):104-15. doi: 10.1093/bioinformatics/bth464. Epub 2004 Aug 12.
10
A hybrid method for relation extraction from biomedical literature.一种从生物医学文献中提取关系的混合方法。
Int J Med Inform. 2006 Jun;75(6):443-55. doi: 10.1016/j.ijmedinf.2005.06.010. Epub 2005 Aug 10.

引用本文的文献

1
Semantic Role Labeling of Clinical Text: Comparing Syntactic Parsers and Features.临床文本的语义角色标注:句法分析器与特征比较
AMIA Annu Symp Proc. 2017 Feb 10;2016:1283-1292. eCollection 2016.
2
BelSmile: a biomedical semantic role labeling approach for extracting biological expression language from text.BelSmile:一种用于从文本中提取生物表达语言的生物医学语义角色标注方法。
Database (Oxford). 2016 May 12;2016. doi: 10.1093/database/baw064. Print 2016.
3
Domain adaptation for semantic role labeling of clinical text.临床文本语义角色标注的领域适应

本文引用的文献

1
Improving protein function prediction methods with integrated literature data.利用整合的文献数据改进蛋白质功能预测方法。
BMC Bioinformatics. 2008 Apr 15;9:198. doi: 10.1186/1471-2105-9-198.
2
OpenDMAP: an open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression.OpenDMAP:一个开源的、由本体驱动的概念分析引擎,应用于捕获有关蛋白质转运、蛋白质相互作用和细胞类型特异性基因表达的知识。
BMC Bioinformatics. 2008 Jan 31;9:78. doi: 10.1186/1471-2105-9-78.
3
BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features.
J Am Med Inform Assoc. 2015 Sep;22(5):967-79. doi: 10.1093/jamia/ocu048. Epub 2015 Jun 10.
4
SimConcept: a hybrid approach for simplifying composite named entities in biomedical text.SimConcept:一种简化生物医学文本中复合命名实体的混合方法。
IEEE J Biomed Health Inform. 2015 Jul;19(4):1385-91. doi: 10.1109/JBHI.2015.2422651. Epub 2015 Apr 13.
5
SimConcept: A Hybrid Approach for Simplifying Composite Named Entities in Biomedicine.SimConcept:一种简化生物医学中复合命名实体的混合方法。
ACM BCB. 2014;2014:138-146. doi: 10.1145/2649387.2649420.
6
A resource-saving collective approach to biomedical semantic role labeling.一种用于生物医学语义角色标注的资源节约型集体方法。
BMC Bioinformatics. 2014 May 27;15:160. doi: 10.1186/1471-2105-15-160.
7
Automating case definitions using literature-based reasoning.使用基于文献的推理自动生成病例定义。
Appl Clin Inform. 2013 Oct 30;4(4):515-27. doi: 10.4338/ACI-2013-04-RA-0028. eCollection 2013.
8
Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts.基于神经网络的语义角色标注在从生物医学文本中自动提取关系方面的大规模应用。
PLoS One. 2009 Jul 28;4(7):e6393. doi: 10.1371/journal.pone.0006393.
BIOSMILE:一种用于生物医学动词的语义角色标注系统,它使用带有自动生成模板特征的最大熵模型。
BMC Bioinformatics. 2007 Sep 1;8:325. doi: 10.1186/1471-2105-8-325.
4
Benchmarking natural-language parsers for biological applications using dependency graphs.使用依存关系图对生物应用中的自然语言解析器进行基准测试。
BMC Bioinformatics. 2007 Jan 25;8:24. doi: 10.1186/1471-2105-8-24.
5
Finding GeneRIFs via gene ontology annotations.通过基因本体注释查找基因相关功能信息(GeneRIFs)
Pac Symp Biocomput. 2006:52-63.
6
Towards semantic role labeling & IE in the medical literature.迈向医学文献中的语义角色标注与信息抽取
AMIA Annu Symp Proc. 2005;2005:410-4.
7
Literature mining for the biologist: from information retrieval to biological discovery.面向生物学家的文献挖掘:从信息检索到生物学发现
Nat Rev Genet. 2006 Feb;7(2):119-29. doi: 10.1038/nrg1768.
8
Database resources of the National Center for Biotechnology Information.美国国立生物技术信息中心的数据库资源。
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D173-80. doi: 10.1093/nar/gkj158.
9
Extraction of transcript diversity from scientific literature.从科学文献中提取转录本多样性
PLoS Comput Biol. 2005 Jun;1(1):e10. doi: 10.1371/journal.pcbi.0010010. Epub 2005 Jun 24.
10
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text.ABNER:一种用于在文本中自动标记基因、蛋白质及其他实体名称的开源工具。
Bioinformatics. 2005 Jul 15;21(14):3191-2. doi: 10.1093/bioinformatics/bti475. Epub 2005 Apr 28.