Suppr超能文献

使用支持向量机识别医院出院小结中的药物相关实体。

Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine.

作者信息

Doan Son, Xu Hua

机构信息

Department of Biomedical Informatics, School of Medicine, Vanderbilt University.

出版信息

Proc Int Conf Comput Ling. 2010 Aug;2010:259-266.

Abstract

Due to the lack of annotated data sets, there are few studies on machine learning based approaches to extract named entities (NEs) in clinical text. The 2009 i2b2 NLP challenge is a task to extract six types of medication related NEs, including medication names, dosage, mode, frequency, duration, and reason from hospital discharge summaries. Several machine learning based systems have been developed and showed good performance in the challenge. Those systems often involve two steps: 1) recognition of medication related entities; and 2) determination of the relation between a medication name and its modifiers (e.g., dosage). A few machine learning algorithms including Conditional Random Field (CRF) and Maximum Entropy have been applied to the Named Entity Recognition (NER) task at the first step. In this study, we developed a Support Vector Machine (SVM) based method to recognize medication related entities. In addition, we systematically investigated various types of features for NER in clinical text. Evaluation on 268 manually annotated discharge summaries from i2b2 challenge showed that the SVM-based NER system achieved the best F-score of 90.05% (93.20% Precision, 87.12% Recall), when semantic features generated from a rule-based system were included.

摘要

由于缺乏带注释的数据集,基于机器学习的方法从临床文本中提取命名实体(NE)的研究很少。2009年的i2b2自然语言处理挑战赛是一项从医院出院小结中提取六种与药物相关的命名实体的任务,包括药物名称、剂量、服用方式、频率、持续时间和用药原因。已经开发了几个基于机器学习的系统,并在挑战赛中表现出良好的性能。这些系统通常涉及两个步骤:1)识别与药物相关的实体;2)确定药物名称与其修饰词(如剂量)之间的关系。包括条件随机场(CRF)和最大熵在内的一些机器学习算法已应用于第一步的命名实体识别(NER)任务。在本研究中,我们开发了一种基于支持向量机(SVM)的方法来识别与药物相关的实体。此外,我们系统地研究了临床文本中NER的各种特征类型。对i2b2挑战赛中268份人工标注的出院小结进行评估,结果表明,当包含基于规则系统生成的语义特征时,基于SVM的NER系统获得了最佳F值,为90.05%(精确率93.20%,召回率87.12%)。

相似文献

4
A comprehensive study of named entity recognition in Chinese clinical text.中文临床文本命名实体识别的综合研究。
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):808-14. doi: 10.1136/amiajnl-2013-002381. Epub 2013 Dec 17.

引用本文的文献

5
A Multitask Deep Learning Framework for DNER.基于深度学习的多任务命名实体识别框架
Comput Intell Neurosci. 2022 Apr 16;2022:3321296. doi: 10.1155/2022/3321296. eCollection 2022.
7
Clinical concept extraction: A methodology review.临床概念提取:方法学综述。
J Biomed Inform. 2020 Sep;109:103526. doi: 10.1016/j.jbi.2020.103526. Epub 2020 Aug 6.
9
Recent advances in biomedical literature mining.生物医学文献挖掘的最新进展。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa057.
10
Scalable Iterative Classification for Sanitizing Large-Scale Datasets.用于清理大规模数据集的可扩展迭代分类
IEEE Trans Knowl Data Eng. 2017 Mar 1;29(3):698-711. doi: 10.1109/TKDE.2016.2628180. Epub 2016 Nov 11.

本文引用的文献

2
BioTagger-GM: a gene/protein name recognition system.生物标记器-GM:一种基因/蛋白质名称识别系统。
J Am Med Inform Assoc. 2009 Mar-Apr;16(2):247-55. doi: 10.1197/jamia.M2844. Epub 2008 Dec 11.
6
Term identification in the biomedical literature.生物医学文献中的术语识别。
J Biomed Inform. 2004 Dec;37(6):512-26. doi: 10.1016/j.jbi.2004.08.004.
7
Rutabaga by any other name: extracting biological names.换个名字的芜菁:提取生物名称。
J Biomed Inform. 2002 Aug;35(4):247-59. doi: 10.1016/s1532-0464(03)00014-5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验