Suppr超能文献

利用语义关系进行可治愈命名实体识别

Curatable Named-Entity Recognition Using Semantic Relations.

作者信息

Hsu Yi-Yu, Kao Hung-Yu

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):785-92. doi: 10.1109/TCBB.2014.2366770.

Abstract

Named-entity recognition (NER) plays an important role in the development of biomedical databases. However, the existing NER tools produce multifarious named-entities which may result in both curatable and non-curatable markers. To facilitate biocuration with a straightforward approach, classifying curatable named-entities is helpful with regard to accelerating the biocuration workflow. Co-occurrence Interaction Nexus with Named-entity Recognition (CoINNER) is a web-based tool that allows users to identify genes, chemicals, diseases, and action term mentions in the Comparative Toxicogenomic Database (CTD). To further discover interactions, CoINNER uses multiple advanced algorithms to recognize the mentions in the BioCreative IV CTD Track. CoINNER is developed based on a prototype system that annotated gene, chemical, and disease mentions in PubMed abstracts at BioCreative 2012 Track I (literature triage). We extended our previous system in developing CoINNER. The pre-tagging results of CoINNER were developed based on the state-of-the-art named entity recognition tools in BioCreative III. Next, a method based on conditional random fields (CRFs) is proposed to predict chemical and disease mentions in the articles. Finally, action term mentions were collected by latent Dirichlet allocation (LDA). At the BioCreative IV CTD Track, the best F-measures reached for gene/protein, chemical/drug and disease NER were 54 percent while CoINNER achieved a 61.5 percent F-measure. System URL: http://ikmbio.csie.ncku.edu.tw/coinner/ introduction.htm.

摘要

命名实体识别(NER)在生物医学数据库的开发中起着重要作用。然而,现有的NER工具会产生各种各样的命名实体,这可能会导致可策划和不可策划的标记。为了以一种直接的方法促进生物编目,对可策划的命名实体进行分类有助于加速生物编目工作流程。命名实体识别共现交互关系(CoINNER)是一个基于网络的工具,允许用户在比较毒理基因组学数据库(CTD)中识别基因、化学物质、疾病和作用术语提及。为了进一步发现相互作用,CoINNER使用多种先进算法来识别生物创意IV CTD赛道中的提及。CoINNER是基于一个在生物创意2012赛道I(文献分类)中注释PubMed摘要中的基因、化学物质和疾病提及的原型系统开发的。我们在开发CoINNER时扩展了我们以前的系统。CoINNER的预标记结果是基于生物创意III中最先进的命名实体识别工具开发的。接下来,提出了一种基于条件随机场(CRF)的方法来预测文章中的化学物质和疾病提及。最后,通过潜在狄利克雷分配(LDA)收集作用术语提及。在生物创意IV CTD赛道中,基因/蛋白质、化学物质/药物和疾病NER的最佳F值达到54%,而CoINNER达到了61.5%的F值。系统网址:http://ikmbio.csie.ncku.edu.tw/coinner/ introduction.htm。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验