使用中文出院小结中的对偶分解进行联合分割和命名实体识别。

Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries.

机构信息

State Key Laboratory of Software Development Environment, Key Laboratory of Biomechanics and Mechanobiology of Ministry of Education, Beihang University, Beijing, China.

出版信息

J Am Med Inform Assoc. 2014 Feb;21(e1):e84-92. doi: 10.1136/amiajnl-2013-001806. Epub 2013 Aug 9.

DOI:10.1136/amiajnl-2013-001806

PMID:23934949

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3957392/

Abstract

OBJECTIVE

In this paper, we focus on three aspects: (1) to annotate a set of standard corpus in Chinese discharge summaries; (2) to perform word segmentation and named entity recognition in the above corpus; (3) to build a joint model that performs word segmentation and named entity recognition.

DESIGN

Two independent systems of word segmentation and named entity recognition were built based on conditional random field models. In the field of natural language processing, while most approaches use a single model to predict outputs, many works have proved that performance of many tasks can be improved by exploiting combined techniques. Therefore, in this paper, we proposed a joint model using dual decomposition to perform both the two tasks in order to exploit correlations between the two tasks. Three sets of features were designed to demonstrate the advantage of the joint model we proposed, compared with independent models, incremental models and a joint model trained on combined labels.

MEASUREMENTS

Micro-averaged precision (P), recall (R), and F-measure (F) were used to evaluate results.

RESULTS

The gold standard corpus is created using 336 Chinese discharge summaries of 71 355 words. The framework using dual decomposition achieved 0.2% improvement for segmentation and 1% improvement for recognition, compared with each of the two tasks alone.

CONCLUSIONS

The joint model is efficient and effective in both segmentation and recognition compared with the two individual tasks. The model achieved encouraging results, demonstrating the feasibility of the two tasks.

摘要

目的

本文重点关注三个方面：（1）标注一组中文出院小结标准语料库；（2）在上述语料库中进行分词和命名实体识别；（3）构建一个联合模型，同时进行分词和命名实体识别。

设计

基于条件随机场模型构建了两个独立的分词和命名实体识别系统。在自然语言处理领域，虽然大多数方法使用单个模型来预测输出，但许多工作已经证明，通过利用组合技术，可以提高许多任务的性能。因此，本文提出了一种联合模型，使用双重分解来执行这两个任务，以利用两个任务之间的相关性。设计了三组特征，以证明与独立模型、增量模型和基于组合标签训练的联合模型相比，我们提出的联合模型的优势。

测量

使用微平均精度（P）、召回率（R）和 F 度量（F）来评估结果。

结果

使用 336 份包含 71355 个单词的中文出院小结创建了黄金标准语料库。与两个独立任务相比，使用双重分解的框架在分词方面提高了 0.2%，在识别方面提高了 1%。

结论

与两个独立任务相比，联合模型在分词和识别方面都更高效、更有效。该模型取得了令人鼓舞的结果，证明了这两个任务的可行性。

相似文献

Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries.使用中文出院小结中的对偶分解进行联合分割和命名实体识别。

J Am Med Inform Assoc. 2014 Feb;21(e1):e84-92. doi: 10.1136/amiajnl-2013-001806. Epub 2013 Aug 9.

A comprehensive study of named entity recognition in Chinese clinical text.中文临床文本命名实体识别的综合研究。

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):808-14. doi: 10.1136/amiajnl-2013-002381. Epub 2013 Dec 17.

A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text.一个用于临床文本的细粒度中文分词和词性标注语料库。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):66. doi: 10.1186/s12911-019-0770-7.

Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.使用带有词表示特征的结构支持向量机识别医院出院小结中的临床实体。

BMC Med Inform Decis Mak. 2013;13 Suppl 1(Suppl 1):S1. doi: 10.1186/1472-6947-13-S1-S1. Epub 2013 Apr 5.

A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.基于机器学习的方法从出院小结中提取临床实体及其断言的研究。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):601-6. doi: 10.1136/amiajnl-2011-000163. Epub 2011 Apr 20.

A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records.一种用于中文电子病历命名实体识别的多任务双向 RNN 模型。

BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):499. doi: 10.1186/s12859-018-2467-9.

Evaluation of clinical named entity recognition methods for Serbian electronic health records.评估塞尔维亚电子健康记录中的临床命名实体识别方法。

Int J Med Inform. 2022 Aug;164:104805. doi: 10.1016/j.ijmedinf.2022.104805. Epub 2022 May 25.

An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records.基于注意力的深度学习模型在中文电子病历临床命名实体识别中的应用。

BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):235. doi: 10.1186/s12911-019-0933-6.

Chinese clinical named entity recognition with radical-level feature and self-attention mechanism.基于词干级特征和自注意力机制的中文临床命名实体识别。

J Biomed Inform. 2019 Oct;98:103289. doi: 10.1016/j.jbi.2019.103289. Epub 2019 Sep 18.

A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.

引用本文的文献

Named Entity Recognition in Electronic Health Records: A Methodological Review.电子健康记录中的命名实体识别：方法学综述

Healthc Inform Res. 2023 Oct;29(4):286-300. doi: 10.4258/hir.2023.29.4.286. Epub 2023 Oct 31.

Surgical procedure long terms recognition from Chinese literature incorporating structural feature.结合结构特征从中国文献中获得手术程序的长期认知。

Heliyon. 2022 Oct 29;8(11):e11291. doi: 10.1016/j.heliyon.2022.e11291. eCollection 2022 Nov.

Clinical Named Entity Recognition from Chinese Electronic Medical Records Based on Deep Learning Pretraining.基于深度学习预训练的中文电子病历临床命名实体识别。

J Healthc Eng. 2020 Nov 24;2020:8829219. doi: 10.1155/2020/8829219. eCollection 2020.

Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine.基于中医临床记录构建细粒度实体识别语料库。

BMC Med Inform Decis Mak. 2020 Apr 6;20(1):64. doi: 10.1186/s12911-020-1079-2.

Automatic approach for constructing a knowledge graph of knee osteoarthritis in Chinese.构建中文膝关节骨关节炎知识图谱的自动方法。

Health Inf Sci Syst. 2020 Feb 27;8(1):12. doi: 10.1007/s13755-020-0102-4. eCollection 2020 Dec.

New warfarin anticoagulation management model after heart valve surgery: rationale and design of a prospective, multicentre, randomised trial to compare an internet-based warfarin anticoagulation management model with the traditional warfarin management model.心脏瓣膜手术后新型华法林抗凝管理模式：一项前瞻性、多中心、随机试验的原理和设计，旨在比较基于互联网的华法林抗凝管理模式与传统华法林管理模式。

BMJ Open. 2019 Dec 5;9(12):e032949. doi: 10.1136/bmjopen-2019-032949.

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.

Constructing a Chinese electronic medical record corpus for named entity recognition on resident admit notes.构建用于住院记录中命名实体识别的中文电子病历语料库。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):56. doi: 10.1186/s12911-019-0759-2.

Clinical Natural Language Processing in languages other than English: opportunities and challenges.非英语语言的临床自然语言处理：机遇与挑战。

J Biomed Semantics. 2018 Mar 30;9(1):12. doi: 10.1186/s13326-018-0179-8.

The effects of patient cost sharing on inpatient utilization, cost, and outcome.患者费用分担对住院利用率、成本和结果的影响。

PLoS One. 2017 Oct 26;12(10):e0187096. doi: 10.1371/journal.pone.0187096. eCollection 2017.

本文引用的文献

Building large collections of Chinese and English medical terms from semi-structured and encyclopedia websites.从半结构化和百科网站构建大型中文和英文医学术语集。

PLoS One. 2013 Jul 9;8(7):e67526. doi: 10.1371/journal.pone.0067526. Print 2013.

Matching health information seekers' queries to medical terms.匹配健康信息搜索者的查询与医学术语。

BMC Bioinformatics. 2012;13 Suppl 14(Suppl 14):S11. doi: 10.1186/1471-2105-13-S14-S11. Epub 2012 Sep 7.

Named entity recognition of follow-up and time information in 20,000 radiology reports.在 20,000 份放射学报告中识别随访和时间信息的实体。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):792-9. doi: 10.1136/amiajnl-2012-000812. Epub 2012 Jul 6.

Combining joint models for biomedical event extraction.联合模型在生物医学事件抽取中的应用。

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S9. doi: 10.1186/1471-2105-13-S11-S9.

Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries.从叙事性临床出院总结中提取结构化信息的特征工程结合机器学习和基于规则的方法。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):824-32. doi: 10.1136/amiajnl-2011-000776. Epub 2012 May 14.

Evaluating the state of the art in coreference resolution for electronic medical records.评估电子病历中核心参考解析的最新技术水平。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):786-91. doi: 10.1136/amiajnl-2011-000784. Epub 2012 Feb 24.

Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions.克服临床文本自然语言处理的障碍：共享任务的作用及对其他创造性解决方案的需求。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):540-3. doi: 10.1136/amiajnl-2011-000465.

2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.2010 i2b2/VA 挑战赛：临床文本中的概念、断言和关系

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.

A context-blocks model for identifying clinical relationships in patient records.一种用于识别病历中临床关系的上下文块模型。

BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-12-S3-S3.

Extracting medication information from clinical text.从临床文本中提取药物信息。

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):514-8. doi: 10.1136/jamia.2010.003947.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。