临床笔记中信息抽取的知识发现和重用管道。

A knowledge discovery and reuse pipeline for information extraction in clinical notes.

机构信息

School of IT, The University of Sydney, Sydney, Australia.

出版信息

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):574-9. doi: 10.1136/amiajnl-2011-000302. Epub 2011 Jul 7.

DOI:10.1136/amiajnl-2011-000302

PMID:21737844

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3168325/

Abstract

OBJECTIVE

Information extraction and classification of clinical data are current challenges in natural language processing. This paper presents a cascaded method to deal with three different extractions and classifications in clinical data: concept annotation, assertion classification and relation classification.

MATERIALS AND METHODS

A pipeline system was developed for clinical natural language processing that includes a proofreading process, with gold-standard reflexive validation and correction. The information extraction system is a combination of a machine learning approach and a rule-based approach. The outputs of this system are used for evaluation in all three tiers of the fourth i2b2/VA shared-task and workshop challenge.

RESULTS

Overall concept classification attained an F-score of 83.3% against a baseline of 77.0%, the optimal F-score for assertions about the concepts was 92.4% and relation classifier attained 72.6% for relationships between clinical concepts against a baseline of 71.0%. Micro-average results for the challenge test set were 81.79%, 91.90% and 70.18%, respectively.

DISCUSSION

The challenge in the multi-task test requires a distribution of time and work load for each individual task so that the overall performance evaluation on all three tasks would be more informative rather than treating each task assessment as independent. The simplicity of the model developed in this work should be contrasted with the very large feature space of other participants in the challenge who only achieved slightly better performance. There is a need to charge a penalty against the complexity of a model as defined in message minimalisation theory when comparing results.

CONCLUSION

A complete pipeline system for constructing language processing models that can be used to process multiple practical detection tasks of language structures of clinical records is presented.

摘要

目的

信息提取和分类是自然语言处理中的当前挑战。本文提出了一种级联方法，用于处理临床数据中的三种不同的提取和分类：概念标注、断言分类和关系分类。

材料和方法

开发了一种用于临床自然语言处理的流水线系统，包括一个校对过程，具有黄金标准的自反验证和纠正。信息提取系统是机器学习方法和基于规则的方法的组合。该系统的输出用于评估第四 i2b2/VA 共享任务和研讨会挑战赛的所有三个层次。

结果

整体概念分类的 F 得分为 83.3%，基线为 77.0%，概念断言的最佳 F 得分为 92.4%，临床概念之间的关系分类器的 F 得分为 72.6%，基线为 71.0%。挑战赛测试集的微平均结果分别为 81.79%、91.90%和 70.18%。

讨论

多任务测试的挑战需要为每个单独的任务分配时间和工作负载，以便对所有三个任务的整体性能评估更具信息性，而不是将每个任务评估视为独立的。与挑战赛中其他仅取得略好性能的参与者相比，本工作中开发的模型的简单性应与非常大的特征空间形成对比。在比较结果时，需要根据消息最小化理论对模型的复杂性进行惩罚。

结论

提出了一种完整的流水线系统，用于构建语言处理模型，可用于处理临床记录语言结构的多个实际检测任务。

相似文献

A knowledge discovery and reuse pipeline for information extraction in clinical notes.临床笔记中信息抽取的知识发现和重用管道。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):574-9. doi: 10.1136/amiajnl-2011-000302. Epub 2011 Jul 7.

2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.2010 i2b2/VA 挑战赛：临床文本中的概念、断言和关系

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):552-6. doi: 10.1136/amiajnl-2011-000203. Epub 2011 Jun 16.

A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.基于机器学习的方法从出院小结中提取临床实体及其断言的研究。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):601-6. doi: 10.1136/amiajnl-2011-000163. Epub 2011 Apr 20.

MITRE system for clinical assertion status classification.MITRE 临床断言状态分类系统。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):563-7. doi: 10.1136/amiajnl-2011-000164. Epub 2011 Apr 22.

A flexible framework for deriving assertions from electronic medical records.从电子病历中推导断言的灵活框架。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):568-73. doi: 10.1136/amiajnl-2011-000152. Epub 2011 Jul 1.

Automatic extraction of relations between medical concepts in clinical texts.临床文本中医用概念间关系的自动提取。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):594-600. doi: 10.1136/amiajnl-2011-000153.

Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification.混合方法提高临床文档信息获取：概念、断言和关系识别。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):588-93. doi: 10.1136/amiajnl-2011-000154. Epub 2011 May 19.

The Yale cTAKES extensions for document classification: architecture and application.耶鲁 CTakes 扩展用于文档分类：架构与应用。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):614-20. doi: 10.1136/amiajnl-2011-000093. Epub 2011 May 27.

Enhancing clinical concept extraction with distributional semantics.利用分布语义增强临床概念提取。

J Biomed Inform. 2012 Feb;45(1):129-40. doi: 10.1016/j.jbi.2011.10.007. Epub 2011 Nov 7.

Automated concept-level information extraction to reduce the need for custom software and rules development.自动化概念级信息提取，以减少对定制软件和规则开发的需求。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):607-13. doi: 10.1136/amiajnl-2011-000183. Epub 2011 Jun 22.

引用本文的文献

Clinical Decision Support and Natural Language Processing in Medicine: Systematic Literature Review.临床决策支持与医学自然语言处理：系统文献回顾。

J Med Internet Res. 2024 Sep 30;26:e55315. doi: 10.2196/55315.

Data-driven information extraction and enrichment of molecular profiling data for cancer cell lines.基于数据驱动的癌细胞系分子图谱数据提取与富集

Bioinform Adv. 2024 Mar 16;4(1):vbae045. doi: 10.1093/bioadv/vbae045. eCollection 2024.

Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review.将自然语言处理应用于临床数据仓库中的文本数据：系统评价。

JMIR Med Inform. 2023 Dec 15;11:e42477. doi: 10.2196/42477.

Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review.系统医学术语命名法（SNOMED CT）在医疗保健中处理自由文本的应用：系统范围综述。

J Med Internet Res. 2021 Jan 26;23(1):e24594. doi: 10.2196/24594.

Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies.自然语言处理算法在将临床文本片段映射到本体概念上的应用：系统评价及对未来研究的建议。

J Biomed Semantics. 2020 Nov 16;11(1):14. doi: 10.1186/s13326-020-00231-z.

Enhancing ontology-driven diagnostic reasoning with a symptom-dependency-aware Naïve Bayes classifier.利用具有症状相关性感知的朴素贝叶斯分类器增强基于本体的诊断推理。

BMC Bioinformatics. 2019 Jun 13;20(1):330. doi: 10.1186/s12859-019-2924-0.

Classifying relations in clinical narratives using segment graph convolutional and recurrent neural networks (Seg-GCRNs).使用分段图卷积和递归神经网络（Seg-GCRNs）对临床叙述中的关系进行分类。

J Am Med Inform Assoc. 2019 Mar 1;26(3):262-268. doi: 10.1093/jamia/ocy157.

Clinically Excellent Use of the Electronic Health Record: Review.电子健康记录的临床卓越应用：综述

JMIR Hum Factors. 2018 Oct 5;5(4):e10426. doi: 10.2196/10426.

Clinical information extraction applications: A literature review.临床信息提取应用：文献综述。

J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.

Use of emergency department electronic medical records for automated epidemiological surveillance of suicide attempts: a French pilot study.利用急诊电子病历进行自杀尝试的自动流行病学监测：法国试点研究。

Int J Methods Psychiatr Res. 2017 Jun;26(2). doi: 10.1002/mpr.1522. Epub 2016 Sep 15.

本文引用的文献

High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge.从临床记录中提取药物信息的高精度信息提取：2009 i2b2 药物提取挑战赛。

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):524-7. doi: 10.1136/jamia.2010.003939.

Machine learning and rule-based approaches to assertion classification.用于断言分类的机器学习和基于规则的方法。

J Am Med Inform Assoc. 2009 Jan-Feb;16(1):109-15. doi: 10.1197/jamia.M2950. Epub 2008 Oct 24.

Extracting information from textual documents in the electronic health record: a review of recent research.从电子健康记录中的文本文件提取信息：近期研究综述

Yearb Med Inform. 2008:128-44.

Automated encoding of clinical documents based on natural language processing.基于自然语言处理的临床文档自动编码

J Am Med Inform Assoc. 2004 Sep-Oct;11(5):392-402. doi: 10.1197/jamia.M1552. Epub 2004 Jun 7.

A simple algorithm for identifying negated findings and diseases in discharge summaries.一种用于识别出院小结中否定性检查结果和疾病的简单算法。

J Biomed Inform. 2001 Oct;34(5):301-10. doi: 10.1006/jbin.2001.1029.

Experience with a mixed semantic/syntactic parser.使用混合语义/句法解析器的经验。

Proc Annu Symp Comput Appl Med Care. 1995:284-8.

A general natural-language text processor for clinical radiology.一种用于临床放射学的通用自然语言文本处理器。

J Am Med Inform Assoc. 1994 Mar-Apr;1(2):161-74. doi: 10.1136/jamia.1994.95236146.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验