CLAMP - 一个用于高效构建定制化临床自然语言处理管道的工具包。

CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.

作者信息

Soysal Ergin, Wang Jingqi, Jiang Min, Wu Yonghui, Pakhomov Serguei, Liu Hongfang, Xu Hua

机构信息

School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.

Department of Pharmaceutical Care and Health System, University of Minnesota Twin Cities, Minneapolis, MN, USA.

出版信息

J Am Med Inform Assoc. 2018 Mar 1;25(3):331-336. doi: 10.1093/jamia/ocx132.

DOI:10.1093/jamia/ocx132

PMID:29186491

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7378877/

Abstract

Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community.

摘要

现有的通用临床自然语言处理（NLP）系统，如MetaMap和临床文本分析与知识提取系统，已成功应用于从临床文本中提取信息。然而，终端用户通常必须针对其个人任务定制现有系统，这可能需要大量的NLP技能。在此，我们展示了CLAMP（临床语言注释、建模和处理），这是一个新开发的临床NLP工具包，它不仅提供了最先进的NLP组件，还提供了一个用户友好的图形用户界面，可帮助用户快速为其个人应用构建定制的NLP管道。我们的评估表明，CLAMP默认管道在命名实体识别和概念编码方面取得了良好的性能。我们还通过两个用例展示了CLAMP图形用户界面在构建定制的高性能NLP管道方面的效率，这两个用例分别是提取吸烟状态和实验室检查值。CLAMP可供公开研究使用，我们相信它是临床NLP社区的一项独特资产。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d841/7378877/ffd96de0b5cc/ocx132f1.jpg

相似文献

CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.CLAMP - 一个用于高效构建定制化临床自然语言处理管道的工具包。

J Am Med Inform Assoc. 2018 Mar 1;25(3):331-336. doi: 10.1093/jamia/ocx132.

Developing Customizable Cancer Information Extraction Modules for Pathology Reports Using CLAMP.使用CLAMP开发用于病理报告的可定制癌症信息提取模块。

Stud Health Technol Inform. 2019 Aug 21;264:1041-1045. doi: 10.3233/SHTI190383.

Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder.自然语言处理（NLP）工具在从研究文章中提取生物医学概念中的应用：以自闭症谱系障碍为例。

BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):322. doi: 10.1186/s12911-020-01352-2.

Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。

J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.

BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab.BENTO：一个基于CodaLab构建临床自然语言处理管道的可视化平台。

Proc Conf Assoc Comput Linguist Meet. 2020 Jul;2020:95-100. doi: 10.18653/v1/2020.acl-demos.13.

An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper).一种用于从生物医学文本中提取溯源元数据的启用本体的自然语言处理管道（短文）。

On Move Meaningful Internet Syst. 2016 Oct;10033:699-708. doi: 10.1007/978-3-319-48472-3_43. Epub 2016 Oct 18.

Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing.基于Web 2.0的众包方式用于临床自然语言处理中高质量金标准的开发。

J Med Internet Res. 2013 Apr 2;15(4):e73. doi: 10.2196/jmir.2426.

Biomedical and clinical English model packages for the Stanza Python NLP library.适用于Stanza Python自然语言处理库的生物医学和临床英语模型包。

J Am Med Inform Assoc. 2021 Aug 13;28(9):1892-1899. doi: 10.1093/jamia/ocab090.

Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction.知识作者：促进用户驱动的领域内容开发，以支持临床信息提取。

J Biomed Semantics. 2016 Jun 23;7(1):42. doi: 10.1186/s13326-016-0086-9.

Transformers-sklearn: a toolkit for medical language understanding with transformer-based models.Transformer-sklearn：一个基于 Transformer 的模型的医学语言理解工具包。

BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):90. doi: 10.1186/s12911-021-01459-0.

引用本文的文献

medspacyV: a graphical user interface for the open source medspaCy natural language processing package.medspacyV：开源medspaCy自然语言处理包的图形用户界面。

JAMIA Open. 2025 Aug 23;8(4):ooaf094. doi: 10.1093/jamiaopen/ooaf094. eCollection 2025 Aug.

SmokeBERT: A BERT-based Model for Quantitative Smoking History Extraction from Clinical Narratives to Improve Lung Cancer Screening.SmokeBERT：一种基于BERT的模型，用于从临床叙述中提取定量吸烟史以改善肺癌筛查

medRxiv. 2025 Jun 20:2025.06.18.25329870. doi: 10.1101/2025.06.18.25329870.

Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasks.基于深度学习的自然语言处理模型在系统文献综述任务的全文数据元素提取中的应用。

Sci Rep. 2025 Jun 3;15(1):19379. doi: 10.1038/s41598-025-03979-5.

Evaluating the Performance of Large Language Models for Named Entity Recognition in Ophthalmology Clinical Free-Text Notes.评估大型语言模型在眼科临床自由文本笔记中进行命名实体识别的性能。

AMIA Annu Symp Proc. 2025 May 22;2024:778-787. eCollection 2024.

Decoding Recurrence in Early-Stage and Locoregionally Advanced Non-Small Cell Lung Cancer: Insights From Electronic Health Records and Natural Language Processing.解读早期及局部晚期非小细胞肺癌的复发情况：来自电子健康记录和自然语言处理的见解

JCO Clin Cancer Inform. 2025 Apr;9:e2400227. doi: 10.1200/CCI-24-00227. Epub 2025 Apr 18.

Physician documentation matters. Using natural language processing to predict mortality in sepsis.医生的记录很重要。利用自然语言处理预测脓毒症死亡率。

Intell Based Med. 2021;5. doi: 10.1016/j.ibmed.2021.100028. Epub 2021 Mar 10.

Real-World Insights Into Dementia Diagnosis Trajectory and Clinical Practice Patterns Unveiled by Natural Language Processing: Development and Usability Study.自然语言处理揭示的痴呆症诊断轨迹和临床实践模式的真实世界见解：开发与可用性研究

JMIR Aging. 2025 Feb 25;8:e65221. doi: 10.2196/65221.

Artificial intelligence to revolutionize IBD clinical trials: a comprehensive review.人工智能将彻底改变炎症性肠病临床试验：全面综述。

Therap Adv Gastroenterol. 2025 Feb 23;18:17562848251321915. doi: 10.1177/17562848251321915. eCollection 2025.

A foundation systematic review of natural language processing applied to gastroenterology & hepatology.一项关于应用于胃肠病学和肝病学的自然语言处理的基础系统评价。

BMC Gastroenterol. 2025 Feb 6;25(1):58. doi: 10.1186/s12876-025-03608-5.

Hybrid natural language processing tool for semantic annotation of medical texts in Spanish.用于西班牙语医学文本语义标注的混合自然语言处理工具。

BMC Bioinformatics. 2025 Jan 8;26(1):7. doi: 10.1186/s12859-024-05949-6.

本文引用的文献

MetaMap Lite: an evaluation of a new Java implementation of MetaMap.MetaMap精简版：对MetaMap新Java实现的评估

J Am Med Inform Assoc. 2017 Jul 1;24(4):841-844. doi: 10.1093/jamia/ocw177.

De-identification of patient notes with recurrent neural networks.使用递归神经网络对患者记录进行去识别化处理。

J Am Med Inform Assoc. 2017 May 1;24(3):596-606. doi: 10.1093/jamia/ocw156.

A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain.临床领域句子边界检测的定量与定性评估

AMIA Jt Summits Transl Sci Proc. 2016 Jul 20;2016:88-97. eCollection 2016.

A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD).从冗长表述到简短缩写的漫长历程：开发一个用于临床缩写识别与消歧的开源框架（CARD）

J Am Med Inform Assoc. 2017 Apr 1;24(e1):e79-e86. doi: 10.1093/jamia/ocw109.

Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields.使用基于词元的公式和条件随机字段识别与评估临床文档中的临床章节标题

Biomed Res Int. 2015;2015:873012. doi: 10.1155/2015/873012. Epub 2015 Aug 26.

Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.用于去识别化的纵向临床记录标注：2014年i2b2/德克萨斯大学健康科学中心语料库

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S20-S29. doi: 10.1016/j.jbi.2015.07.020. Epub 2015 Aug 28.

Ease of adoption of clinical natural language processing software: An evaluation of five systems.临床自然语言处理软件的易用性：五个系统的评估

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S189-S196. doi: 10.1016/j.jbi.2015.07.008. Epub 2015 Jul 22.

Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences.临床文本的句法分析：处理不规范句子的指南和语料库开发。

J Am Med Inform Assoc. 2013 Nov-Dec;20(6):1168-77. doi: 10.1136/amiajnl-2013-001810. Epub 2013 Aug 1.

Extracting physician group intelligence from electronic health records to support evidence based medicine.从电子健康记录中提取医师群体智能，以支持循证医学。

PLoS One. 2013 May 29;8(5):e64933. doi: 10.1371/journal.pone.0064933. Print 2013.

A hybrid system for temporal information extraction from clinical text.一种从临床文本中提取时间信息的混合系统。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):828-35. doi: 10.1136/amiajnl-2013-001635. Epub 2013 Apr 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

CLAMP - 一个用于高效构建定制化临床自然语言处理管道的工具包。

CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献