• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BIOSMILE:一种用于生物医学动词的语义角色标注系统,它使用带有自动生成模板特征的最大熵模型。

BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features.

作者信息

Tsai Richard Tzong-Han, Chou Wen-Chi, Su Ying-Shan, Lin Yu-Chun, Sung Cheng-Lung, Dai Hong-Jie, Yeh Irene Tzu-Hsuan, Ku Wei, Sung Ting-Yi, Hsu Wen-Lian

机构信息

Institute of Information Science, Academia Sinica, Nankang, Taipei 115, Taiwan, PRoC.

出版信息

BMC Bioinformatics. 2007 Sep 1;8:325. doi: 10.1186/1471-2105-8-325.

DOI:10.1186/1471-2105-8-325
PMID:17764570
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2072962/
Abstract

BACKGROUND

Bioinformatics tools for automatic processing of biomedical literature are invaluable for both the design and interpretation of large-scale experiments. Many information extraction (IE) systems that incorporate natural language processing (NLP) techniques have thus been developed for use in the biomedical field. A key IE task in this field is the extraction of biomedical relations, such as protein-protein and gene-disease interactions. However, most biomedical relation extraction systems usually ignore adverbial and prepositional phrases and words identifying location, manner, timing, and condition, which are essential for describing biomedical relations. Semantic role labeling (SRL) is a natural language processing technique that identifies the semantic roles of these words or phrases in sentences and expresses them as predicate-argument structures. We construct a biomedical SRL system called BIOSMILE that uses a maximum entropy (ME) machine-learning model to extract biomedical relations. BIOSMILE is trained on BioProp, our semi-automatic, annotated biomedical proposition bank. Currently, we are focusing on 30 biomedical verbs that are frequently used or considered important for describing molecular events.

RESULTS

To evaluate the performance of BIOSMILE, we conducted two experiments to (1) compare the performance of SRL systems trained on newswire and biomedical corpora; and (2) examine the effects of using biomedical-specific features. The experimental results show that using BioProp improves the F-score of the SRL system by 21.45% over an SRL system that uses a newswire corpus. It is noteworthy that adding automatically generated template features improves the overall F-score by a further 0.52%. Specifically, ArgM-LOC, ArgM-MNR, and Arg2 achieve statistically significant performance improvements of 3.33%, 2.27%, and 1.44%, respectively.

CONCLUSION

We demonstrate the necessity of using a biomedical proposition bank for training SRL systems in the biomedical domain. Besides the different characteristics of biomedical and newswire sentences, factors such as cross-domain framesets and verb usage variations also influence the performance of SRL systems. For argument classification, we find that NE (named entity) features indicating if the target node matches with NEs are not effective, since NEs may match with a node of the parsing tree that does not have semantic role labels in the training set. We therefore incorporate templates composed of specific words, NE types, and POS tags into the SRL system. As a result, the classification accuracy for adjunct arguments, which is especially important for biomedical SRL, is improved significantly.

摘要

背景

用于自动处理生物医学文献的生物信息学工具对于大规模实验的设计和解释都非常重要。因此,许多结合了自然语言处理(NLP)技术的信息提取(IE)系统已被开发用于生物医学领域。该领域的一项关键IE任务是提取生物医学关系,例如蛋白质-蛋白质和基因-疾病相互作用。然而,大多数生物医学关系提取系统通常会忽略状语和介词短语以及表示位置、方式、时间和条件的词,而这些对于描述生物医学关系至关重要。语义角色标注(SRL)是一种自然语言处理技术,可识别句子中这些词或短语的语义角色,并将它们表示为谓词-论元结构。我们构建了一个名为BIOSMILE的生物医学SRL系统,该系统使用最大熵(ME)机器学习模型来提取生物医学关系。BIOSMILE在BioProp(我们的半自动注释生物医学命题库)上进行训练。目前,我们专注于30个经常使用或被认为对描述分子事件很重要的生物医学动词。

结果

为了评估BIOSMILE的性能,我们进行了两项实验,以(1)比较在新闻专线和生物医学语料库上训练的SRL系统的性能;以及(2)检查使用生物医学特定特征的效果。实验结果表明,与使用新闻专线语料库的SRL系统相比,使用BioProp可使SRL系统的F值提高21.45%。值得注意的是,添加自动生成的模板特征可使整体F值进一步提高0.52%。具体而言,ArgM-LOC、ArgM-MNR和Arg2的性能分别实现了3.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f9f/2072962/f9ba65e0d372/1471-2105-8-325-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f9f/2072962/ae693e6728f8/1471-2105-8-325-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f9f/2072962/2f835cf7b0a6/1471-2105-8-325-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f9f/2072962/9e14c3d6d2d2/1471-2105-8-325-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f9f/2072962/a4b3a714b374/1471-2105-8-325-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f9f/2072962/b84d31e7c5ba/1471-2105-8-325-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f9f/2072962/f9ba65e0d372/1471-2105-8-325-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f9f/2072962/ae693e6728f8/1471-2105-8-325-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f9f/2072962/2f835cf7b0a6/1471-2105-8-325-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f9f/2072962/9e14c3d6d2d2/1471-2105-8-325-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f9f/2072962/a4b3a714b374/1471-2105-8-325-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f9f/2072962/b84d31e7c5ba/1471-2105-8-325-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f9f/2072962/f9ba65e0d372/1471-2105-8-325-2.jpg

相似文献

1
BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features.BIOSMILE:一种用于生物医学动词的语义角色标注系统,它使用带有自动生成模板特征的最大熵模型。
BMC Bioinformatics. 2007 Sep 1;8:325. doi: 10.1186/1471-2105-8-325.
2
Semi-automatic conversion of BioProp semantic annotation to PASBio annotation.将生物属性语义注释半自动转换为PASBio注释。
BMC Bioinformatics. 2008 Dec 12;9 Suppl 12(Suppl 12):S18. doi: 10.1186/1471-2105-9-S12-S18.
3
Semantic role labeling for protein transport predicates.蛋白质转运谓词的语义角色标注。
BMC Bioinformatics. 2008 Jun 11;9:277. doi: 10.1186/1471-2105-9-277.
4
A resource-saving collective approach to biomedical semantic role labeling.一种用于生物医学语义角色标注的资源节约型集体方法。
BMC Bioinformatics. 2014 May 27;15:160. doi: 10.1186/1471-2105-15-160.
5
Domain adaptation for semantic role labeling in the biomedical domain.生物医学领域的语义角色标注的领域自适应。
Bioinformatics. 2010 Apr 15;26(8):1098-104. doi: 10.1093/bioinformatics/btq075. Epub 2010 Feb 23.
6
Domain adaptation for semantic role labeling of clinical text.临床文本语义角色标注的领域适应
J Am Med Inform Assoc. 2015 Sep;22(5):967-79. doi: 10.1093/jamia/ocu048. Epub 2015 Jun 10.
7
A critical review of PASBio's argument structures for biomedical verbs.对PASBio关于生物医学动词的论证结构的批判性综述。
BMC Bioinformatics. 2006 Nov 24;7 Suppl 3(Suppl 3):S5. doi: 10.1186/1471-2105-7-S3-S5.
8
Automatic identification and classification of noun argument structures in biomedical literature.生物医学文献中名词论元结构的自动识别与分类。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1639-48. doi: 10.1109/TCBB.2012.111.
9
A hybrid method for relation extraction from biomedical literature.一种从生物医学文献中提取关系的混合方法。
Int J Med Inform. 2006 Jun;75(6):443-55. doi: 10.1016/j.ijmedinf.2005.06.010. Epub 2005 Aug 10.
10
Construction of an annotated corpus to support biomedical information extraction.构建带注释语料库以支持生物医学信息抽取。
BMC Bioinformatics. 2009 Oct 23;10:349. doi: 10.1186/1471-2105-10-349.

引用本文的文献

1
A context-based ABC model for literature-based discovery.基于上下文的文献发现 ABC 模型。
PLoS One. 2019 Apr 24;14(4):e0215313. doi: 10.1371/journal.pone.0215313. eCollection 2019.
2
Evaluating Casama: Contextualized semantic maps for summarization of lung cancer studies.评估 Casama:用于肺癌研究总结的上下文语义图。
Comput Biol Med. 2018 Jan 1;92:55-63. doi: 10.1016/j.compbiomed.2017.10.034. Epub 2017 Nov 3.
3
Toward patient-tailored summarization of lung cancer literature.迈向肺癌文献的个性化总结。

本文引用的文献

1
Towards semantic role labeling & IE in the medical literature.迈向医学文献中的语义角色标注与信息抽取
AMIA Annu Symp Proc. 2005;2005:410-4.
2
An online literature mining tool for protein phosphorylation.一种用于蛋白质磷酸化的在线文献挖掘工具。
Bioinformatics. 2006 Jul 1;22(13):1668-9. doi: 10.1093/bioinformatics/btl159. Epub 2006 Apr 27.
3
LSAT: learning about alternative transcripts in MEDLINE.LSAT:了解医学在线数据库(MEDLINE)中的可变转录本
IEEE EMBS Int Conf Biomed Health Inform. 2016 Feb;2016:449-452. doi: 10.1109/BHI.2016.7455931. Epub 2016 Apr 21.
4
Semantic Role Labeling of Clinical Text: Comparing Syntactic Parsers and Features.临床文本的语义角色标注:句法分析器与特征比较
AMIA Annu Symp Proc. 2017 Feb 10;2016:1283-1292. eCollection 2016.
5
BelSmile: a biomedical semantic role labeling approach for extracting biological expression language from text.BelSmile:一种用于从文本中提取生物表达语言的生物医学语义角色标注方法。
Database (Oxford). 2016 May 12;2016. doi: 10.1093/database/baw064. Print 2016.
6
Domain adaptation for semantic role labeling of clinical text.临床文本语义角色标注的领域适应
J Am Med Inform Assoc. 2015 Sep;22(5):967-79. doi: 10.1093/jamia/ocu048. Epub 2015 Jun 10.
7
BioC interoperability track overview.生物信息学互操作性赛道概述。
Database (Oxford). 2014 Jun 30;2014. doi: 10.1093/database/bau053. Print 2014.
8
A resource-saving collective approach to biomedical semantic role labeling.一种用于生物医学语义角色标注的资源节约型集体方法。
BMC Bioinformatics. 2014 May 27;15:160. doi: 10.1186/1471-2105-15-160.
9
The BioLexicon: a large-scale terminological resource for biomedical text mining.生物词典:一个用于生物医学文本挖掘的大规模术语资源。
BMC Bioinformatics. 2011 Oct 12;12:397. doi: 10.1186/1471-2105-12-397.
10
eFIP: a tool for mining functional impact of phosphorylation from literature.eFIP:一种从文献中挖掘磷酸化功能影响的工具。
Methods Mol Biol. 2011;694:63-75. doi: 10.1007/978-1-60761-977-2_5.
Bioinformatics. 2006 Apr 1;22(7):857-65. doi: 10.1093/bioinformatics/btk044. Epub 2006 Jan 12.
4
Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome.整合已知的人类蛋白质-蛋白质相互作用集,为大规模绘制人类相互作用组做准备。
Genome Biol. 2005;6(5):R40. doi: 10.1186/gb-2005-6-5-r40. Epub 2005 Apr 15.
5
A probabilistic functional network of yeast genes.酵母基因的概率功能网络。
Science. 2004 Nov 26;306(5701):1555-8. doi: 10.1126/science.1099511.
6
PASBio: predicate-argument structures for event extraction in molecular biology.PASBio:用于分子生物学事件提取的谓词-论元结构
BMC Bioinformatics. 2004 Oct 19;5:155. doi: 10.1186/1471-2105-5-155.
7
Content-rich biological network constructed by mining PubMed abstracts.通过挖掘PubMed摘要构建的内容丰富的生物网络。
BMC Bioinformatics. 2004 Oct 8;5:147. doi: 10.1186/1471-2105-5-147.
8
Extending the mutual information measure to rank inferred literature relationships.扩展互信息测度以对推断的文献关系进行排序。
BMC Bioinformatics. 2004 Oct 7;5:145. doi: 10.1186/1471-2105-5-145.
9
GENIA corpus--semantically annotated corpus for bio-textmining.GENIA语料库——用于生物文本挖掘的语义标注语料库。
Bioinformatics. 2003;19 Suppl 1:i180-2. doi: 10.1093/bioinformatics/btg1023.
10
PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.PreBIND和Textomy——使用支持向量机挖掘生物医学文献中的蛋白质-蛋白质相互作用。
BMC Bioinformatics. 2003 Mar 27;4:11. doi: 10.1186/1471-2105-4-11.