Suppr超能文献

NERChem:通过全词元特征和具有化学子类组成的命名实体特征,使NERBio适用于化学专利。

NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition.

作者信息

Tsai Richard Tzong-Han, Hsiao Yu-Cheng, Lai Po-Ting

出版信息

Database (Oxford). 2016 Oct 25;2016:baw135. doi: 10.1093/database/baw135.

Abstract

Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fields model (CRF). Our approach incorporates ( 1 ) class composition, which is used for combining chemical classes whose naming conventions are similar; ( 2 ) BioNE features, which are used for distinguishing chemical mentions from other biomedical NE mentions in the patents; and ( 3 ) full-token word features, which are used to resolve the tokenization granularity problem. We evaluated our approach on the BioCreative V CHEMDNER-patent corpus, and achieved an F-score of 87.17% in the Chemical Entity Mention in Patents (CEMP) task and a sensitivity of 98.58% in the Chemical Passage Detection (CPD) task, ranking alongside the top systems. Database URL: Our NERChem web-based system is publicly available at iisrserv.csie.n cu.edu.tw/nerchem.

摘要

化学专利包含有关新型化合物的详细信息,这些信息对化学和制药行业具有重要价值。在本文中,我们介绍了一种名为NERChem的系统,它能够识别化学专利中提及的化学命名实体。NERChem基于条件随机场模型(CRF)。我们的方法包括:(1)类组合,用于组合命名惯例相似的化学类别;(2)BioNE特征,用于在专利中区分化学提及与其他生物医学命名实体提及;(3)全词元词特征,用于解决词元化粒度问题。我们在BioCreative V CHEMDNER-专利语料库上评估了我们的方法,在专利中的化学实体提及(CEMP)任务中获得了87.17%的F值,在化学段落检测(CPD)任务中获得了98.58%的灵敏度,与顶级系统并列。数据库网址:我们基于网络的NERChem系统可在iisrserv.csie.n cu.edu.tw/nerchem上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/a7e94237669d/baw135f1p.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验