• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索命名实体识别潜力以及定制自然语言处理管道在放射学、病理学和临床决策支持中的病程记录方面的价值:定量研究

Exploring Named Entity Recognition Potential and the Value of Tailored Natural Language Processing Pipelines for Radiology, Pathology, and Progress Notes in Clinical Decision Support: Quantitative Study.

作者信息

Kocaman Veysel, Cheng Fu-Yuan, Bonis Julio, Raut Ganesh, Timsina Prem, Talby David, Kia Arash

机构信息

John Snow Labs Inc, Lewes, DE, United States.

Institute for Healthcare Delivery Science, Mount Sinai, New York, NY, United States.

出版信息

JMIR AI. 2025 Sep 5;4:e59251. doi: 10.2196/59251.

DOI:10.2196/59251
PMID:40911864
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12449662/
Abstract

BACKGROUND

Clinical notes house rich, yet unstructured, patient data, making analysis challenging due to medical jargon, abbreviations, and synonyms causing ambiguity. This complicates real-time extraction for decision support tools.

OBJECTIVE

This study aimed to examine the data curation, technology, and workflow of the named entity recognition (NER) pipeline, a component of a broader clinical decision support tool that identifies key entities using NER models and classifies these entities as present or absent in the patient through an NER assertion model.

METHODS

We gathered progress care, radiology, and pathology notes from 5000 patients, dividing them into 5 batches of 1000 patients each. Metrics such as notes and reports per patient, sentence count, token size, runtime, central processing unit, and memory use were measured per note type. We also evaluated the precision of the NER outputs and then the precision and recall of NER assertion models against manual annotations by a clinical expert.

RESULTS

Using Spark natural language processing clinical pretrained NER models on 138,250 clinical notes, we observed excellent NER precision, with a peak in procedures at 0.989 (95% CI 0.977-1.000) and an accuracy in the assertion model of 0.889 (95% CI 0.856-0.922). Our analysis highlighted long-tail distributions in notes per patient, note length, and entity density. Progress care notes had notably more entities per sentence than radiology and pathology notes, showing 4-fold and 16-fold differences, respectively.

CONCLUSIONS

Further research should explore the analysis of clinical notes beyond the scope of our study, including discharge summaries and psychiatric evaluation notes. Recognizing the unique linguistic characteristics of different note types underscores the importance of developing specialized NER models or natural language processing pipeline setups tailored to each type. By doing so, we can enhance their performance across a more diverse range of clinical scenarios.

摘要

背景

临床记录包含丰富但无结构化的患者数据,由于医学术语、缩写和同义词导致的歧义,使得分析具有挑战性。这使决策支持工具的实时提取变得复杂。

目的

本研究旨在检查命名实体识别(NER)管道的数据管理、技术和工作流程,NER管道是更广泛的临床决策支持工具的一个组件,该工具使用NER模型识别关键实体,并通过NER断言模型将这些实体分类为在患者中存在或不存在。

方法

我们收集了5000名患者的进展护理、放射学和病理学记录,将它们分成5批,每批1000名患者。针对每种记录类型测量了诸如每位患者的记录和报告数量、句子数量、令牌大小、运行时间、中央处理器和内存使用等指标。我们还评估了NER输出的精度,然后针对临床专家的手动注释评估了NER断言模型的精度和召回率。

结果

在138,250份临床记录上使用Spark自然语言处理临床预训练的NER模型,我们观察到NER具有出色的精度,手术方面的峰值为0.989(95%CI 0.977 - 1.000),断言模型的准确率为0.889(95%CI 0.856 - 0.922)。我们的分析突出了每位患者的记录、记录长度和实体密度的长尾分布。进展护理记录每句话中的实体明显多于放射学和病理学记录,分别显示出4倍和16倍的差异。

结论

进一步的研究应探索超出我们研究范围的临床记录分析,包括出院小结和精神科评估记录。认识到不同记录类型的独特语言特征强调了开发针对每种类型量身定制的专门NER模型或自然语言处理管道设置的重要性。通过这样做,我们可以在更多样化的临床场景中提高它们的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c858/12449662/18c2ee139ed2/ai_v4i1e59251_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c858/12449662/eff5730cf361/ai_v4i1e59251_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c858/12449662/18c2ee139ed2/ai_v4i1e59251_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c858/12449662/eff5730cf361/ai_v4i1e59251_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c858/12449662/18c2ee139ed2/ai_v4i1e59251_fig2.jpg

相似文献

1
Exploring Named Entity Recognition Potential and the Value of Tailored Natural Language Processing Pipelines for Radiology, Pathology, and Progress Notes in Clinical Decision Support: Quantitative Study.探索命名实体识别潜力以及定制自然语言处理管道在放射学、病理学和临床决策支持中的病程记录方面的价值:定量研究
JMIR AI. 2025 Sep 5;4:e59251. doi: 10.2196/59251.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Identifying Adverse Drug Events in Clinical Text Using Fine-Tuned Clinical Language Models: Machine Learning Study.使用微调临床语言模型识别临床文本中的药物不良事件:机器学习研究
JMIR Form Res. 2025 Sep 11;9:e71949. doi: 10.2196/71949.
4
From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.从BERT到生成式人工智能——在一组肺癌患者中比较仅编码器模型与大语言模型用于非结构化医疗报告中的命名实体识别
Comput Biol Med. 2025 Sep;195:110665. doi: 10.1016/j.compbiomed.2025.110665. Epub 2025 Jun 24.
5
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.通过在出院小结中添加重点内容提高大语言模型的总结准确性:比较评估
JMIR Med Inform. 2025 Jul 24;13:e66476. doi: 10.2196/66476.
6
The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study.生成式预训练变换器4(GPT-4)分析三种不同语言医学笔记的潜力:一项回顾性模型评估研究。
Lancet Digit Health. 2025 Jan;7(1):e35-e43. doi: 10.1016/S2589-7500(24)00246-2.
7
Eye donation from palliative and hospice care contexts: the EDiPPPP mixed-methods study.从姑息治疗和临终关怀环境中进行眼捐献:EDiPPPP 混合方法研究。
Health Soc Care Deliv Res. 2023 Nov;11(20):1-159. doi: 10.3310/KJWA6741.
8
Artificial intelligence in healthcare text processing: a review applied to named entity recognition.医疗文本处理中的人工智能:应用于命名实体识别的综述
Front Artif Intell. 2025 Jul 7;8:1584203. doi: 10.3389/frai.2025.1584203. eCollection 2025.
9
The agreement of phonetic transcriptions between paediatric speech and language therapists transcribing a disordered speech sample.儿科言语和语言治疗师转写语音样本的音标转录的一致性。
Int J Lang Commun Disord. 2024 Sep-Oct;59(5):1981-1995. doi: 10.1111/1460-6984.13043. Epub 2024 Jun 8.
10
An Extraction Tool for Venous Thromboembolism Symptom Identification in Primary Care Notes to Facilitate Electronic Clinical Quality Measure Reporting: Algorithm Development and Validation Study.一种用于在初级保健记录中识别静脉血栓栓塞症状以促进电子临床质量指标报告的提取工具:算法开发与验证研究
JMIR Med Inform. 2025 Aug 26;13:e63720. doi: 10.2196/63720.

本文引用的文献

1
Prevalence and Sources of Duplicate Information in the Electronic Medical Record.电子病历中重复信息的流行率和来源。
JAMA Netw Open. 2022 Sep 1;5(9):e2233348. doi: 10.1001/jamanetworkopen.2022.33348.
2
Challenges in clinical natural language processing for automated disorder normalization.临床自然语言处理中自动疾病标准化的挑战。
J Biomed Inform. 2015 Oct;57:28-37. doi: 10.1016/j.jbi.2015.07.010. Epub 2015 Jul 14.
3
Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies.
电子健康记录语料库中的冗余:分析、对文本挖掘性能的影响和缓解策略。
BMC Bioinformatics. 2013 Jan 16;14:10. doi: 10.1186/1471-2105-14-10.
4
Extracting information from textual documents in the electronic health record: a review of recent research.从电子健康记录中的文本文件提取信息:近期研究综述
Yearb Med Inform. 2008:128-44.
5
Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success.使用临床决策支持系统改善临床实践:对确定成功关键特征的试验进行系统评价。
BMJ. 2005 Apr 2;330(7494):765. doi: 10.1136/bmj.38398.500764.8F. Epub 2005 Mar 14.
6
A computer-based medical-history system.一个基于计算机的病史系统。
N Engl J Med. 1966 Jan 27;274(4):194-8. doi: 10.1056/NEJM196601272740406.
7
Medical records that guide and teach.具有指导和教学作用的医疗记录。
N Engl J Med. 1968 Mar 14;278(11):593-600. doi: 10.1056/NEJM196803142781105.