Suppr超能文献

Cimind:一种基于语音的生物医学文本多语言命名实体识别工具。

Cimind: A phonetic-based tool for multilingual named entity recognition in biomedical texts.

机构信息

Normandie Univ., TIBS - LITIS EA 4108, Rouen Normandy University, France.

Department of Biomedical Informatics, Rouen University Hospital, Normandy, France; French National Institute for Health, INSERM, LIMICS UMR-1142, France.

出版信息

J Biomed Inform. 2019 Jun;94:103176. doi: 10.1016/j.jbi.2019.103176. Epub 2019 Apr 11.

Abstract

BACKGROUND

Extracting concepts from biomedical texts is a key to support many advanced applications such as biomedical information retrieval. However, in clinical notes Named Entity Recognition (NER) has to deal with various types of errors such as spelling errors, grammatical errors, truncated sentences, and non-standard abbreviations. Moreover, in numerous countries, NER is challenged by the availability of many resources originally developed and only suitable for English texts. This paper presents the Cimind system, a multilingual system dedicated to named entity recognition in medical texts based on a phonetic similarity measure.

METHODS

Cimind performs entity recognition by combining phonetic recognition using the DM phonetic algorithm to deal with spelling errors and string similarity measures. Three main steps are processed to identify terms in a controlled vocabulary: normalization, candidate selection by phonetic similarity and candidate ranking.

RESULTS

Cimind was evaluated in the 2016 and 2017 editions of the CLEF eHealth challenge in the CépiDC/CDC tasks. In 2017, it obtained on each corpus the following results: English dataset: 83.9% P, 78.3% R, 81.0% F1; French raw dataset: 85.7% P, 68.9% R, 76.4% F1; French aligned dataset: 83.5% P, 77.5% R, 80.4% F1. It ranked first in French and fourth in English in officials runs.

摘要

背景

从生物医学文本中提取概念是支持许多高级应用程序(如生物医学信息检索)的关键。然而,在临床记录中,命名实体识别(NER)必须应对各种类型的错误,如拼写错误、语法错误、截断的句子和非标准缩写。此外,在许多国家,NER 面临着缺乏许多最初开发的资源,而这些资源只适用于英文文本的问题。本文介绍了 Cimind 系统,这是一个基于语音相似性度量的多语言医学文本命名实体识别系统。

方法

Cimind 通过结合使用 DM 语音算法进行语音识别来处理拼写错误和字符串相似性度量来进行实体识别。在识别受控词汇中的术语时,系统会经过三个主要步骤:规范化、通过语音相似性选择候选词和候选词排序。

结果

Cimind 在 2016 年和 2017 年的 CLEF eHealth 挑战赛的 CépiDC/CDC 任务中进行了评估。在 2017 年,它在每个语料库上的结果如下:英语数据集:83.9% P、78.3% R、81.0% F1;法语原始数据集:85.7% P、68.9% R、76.4% F1;法语对齐数据集:83.5% P、77.5% R、80.4% F1。在官方运行中,它在法语中排名第一,在英语中排名第四。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验