• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

迈向信息提取:从生物学论文中识别蛋白质名称。

Toward information extraction: identifying protein names from biological papers.

作者信息

Fukuda K, Tamura A, Tsunoda T, Takagi T

机构信息

Human Genome Center, University of Tokyo, Japan.

出版信息

Pac Symp Biocomput. 1998:707-18.

PMID:9697224
Abstract

To solve the mystery of the life phenomenon, we must clarify when genes are expressed and how their products interact with each other. But since the amount of continuously updated knowledge on these interactions is massive and is only available in the form of published articles, an intelligent information extraction (IE) system is needed. To extract these information directly from articles, the system must firstly identify the material names. However, medical and biological documents often include proper nouns newly made by the authors, and conventional methods based on domain specific dictionaries cannot detect such unknown words or coinages. In this study, we propose a new method of extracting material names, PROPER, using surface clue on character strings. It extracts material names in the sentence with 94.70% precision and 98.84% recall, regardless of whether it is already known or newly defined.

摘要

为了解开生命现象之谜,我们必须弄清楚基因何时表达以及它们的产物如何相互作用。但是,由于关于这些相互作用的不断更新的知识量巨大,且仅以已发表文章的形式存在,因此需要一个智能信息提取(IE)系统。为了直接从文章中提取这些信息,该系统必须首先识别物质名称。然而,医学和生物学文献中常常包含作者新造的专有名词,基于特定领域词典的传统方法无法检测到这类未知词汇或新造词。在本研究中,我们提出了一种新的提取物质名称的方法——PROPER,它利用字符串的表面线索。无论该物质名称是已知的还是新定义的,它在句子中提取物质名称的精确率为94.70%,召回率为98.84%。

相似文献

1
Toward information extraction: identifying protein names from biological papers.迈向信息提取:从生物学论文中识别蛋白质名称。
Pac Symp Biocomput. 1998:707-18.
2
Building a protein name dictionary from full text: a machine learning term extraction approach.从全文构建蛋白质名称词典:一种机器学习术语提取方法。
BMC Bioinformatics. 2005 Apr 7;6:88. doi: 10.1186/1471-2105-6-88.
3
Protein names precisely peeled off free text.蛋白质名称从自由文本中精确提取。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i241-7. doi: 10.1093/bioinformatics/bth904.
4
Comparative experiments on learning information extractors for proteins and their interactions.蛋白质及其相互作用的学习信息提取器的比较实验。
Artif Intell Med. 2005 Feb;33(2):139-55. doi: 10.1016/j.artmed.2004.07.016.
5
Recognizing names in biomedical texts: a machine learning approach.识别生物医学文本中的名称:一种机器学习方法。
Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.
6
Discovering patterns to extract protein-protein interactions from full texts.从全文中发现提取蛋白质-蛋白质相互作用的模式。
Bioinformatics. 2004 Dec 12;20(18):3604-12. doi: 10.1093/bioinformatics/bth451. Epub 2004 Jul 29.
7
Improving the performance of dictionary-based approaches in protein name recognition.提高基于词典方法在蛋白质名称识别中的性能。
J Biomed Inform. 2004 Dec;37(6):461-70. doi: 10.1016/j.jbi.2004.08.003.
8
Automatic extraction of gene and protein synonyms from MEDLINE and journal articles.从MEDLINE和期刊文章中自动提取基因和蛋白质同义词。
Proc AMIA Symp. 2002:919-23.
9
MeSH and specialized terminologies: coverage in the field of molecular biology.医学主题词表及专业术语:分子生物学领域的覆盖范围
Stud Health Technol Inform. 2004;107(Pt 1):530-4.
10
Two learning approaches for protein name extraction.两种蛋白质名称提取的学习方法。
J Biomed Inform. 2009 Dec;42(6):1046-55. doi: 10.1016/j.jbi.2009.05.004. Epub 2009 May 13.

引用本文的文献

1
BioBBC: a multi-feature model that enhances the detection of biomedical entities.生物 BBC:一种增强生物医学实体检测的多特征模型。
Sci Rep. 2024 Apr 2;14(1):7697. doi: 10.1038/s41598-024-58334-x.
2
Advancing entity recognition in biomedicine via instruction tuning of large language models.通过指令调整大型语言模型推进生物医学中的实体识别。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae163.
3
A BERT-Span model for Chinese named entity recognition in rehabilitation medicine.一种用于康复医学中文命名实体识别的BERT跨度模型。
PeerJ Comput Sci. 2023 Aug 21;9:e1535. doi: 10.7717/peerj-cs.1535. eCollection 2023.
4
Biomedical named entity recognition based on fusion multi-features embedding.基于融合多特征嵌入的生物医学命名实体识别。
Technol Health Care. 2023;31(S1):111-121. doi: 10.3233/THC-236011.
5
BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework.BioByGANS:通过图注意力网络在节点分类框架中融合上下文和句法特征进行生物医学命名实体识别。
BMC Bioinformatics. 2022 Nov 22;23(1):501. doi: 10.1186/s12859-022-05051-9.
6
How Do Your Biomedical Named Entity Recognition Models Generalize to Novel Entities?你的生物医学命名实体识别模型如何推广到新实体?
IEEE Access. 2022 Mar 8;10:31513-31523. doi: 10.1109/ACCESS.2022.3157854. eCollection 2022.
7
LPInsider: a webserver for lncRNA-protein interaction extraction from the literature.LPInsider:一个用于从文献中提取长链非编码RNA与蛋白质相互作用的网络服务器。
BMC Bioinformatics. 2022 Apr 15;23(1):135. doi: 10.1186/s12859-022-04665-3.
8
Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations.电子病历中的中文临床命名实体识别:基于上下文特征表示的格长短期记忆模型的开发
JMIR Med Inform. 2020 Sep 4;8(9):e19848. doi: 10.2196/19848.
9
Ten tips for a text-mining-ready article: How to improve automated discoverability and interpretability.撰写便于文本挖掘文章的 10 个技巧:如何提高自动化可发现性和可解释性。
PLoS Biol. 2020 Jun 1;18(6):e3000716. doi: 10.1371/journal.pbio.3000716. eCollection 2020 Jun.
10
Multitask learning for biomedical named entity recognition with cross-sharing structure.基于交叉共享结构的生物医学命名实体识别的多任务学习。
BMC Bioinformatics. 2019 Aug 16;20(1):427. doi: 10.1186/s12859-019-3000-5.