Suppr超能文献

探索命名实体识别潜力以及定制自然语言处理管道在放射学、病理学和临床决策支持中的病程记录方面的价值:定量研究

Exploring Named Entity Recognition Potential and the Value of Tailored Natural Language Processing Pipelines for Radiology, Pathology, and Progress Notes in Clinical Decision Support: Quantitative Study.

作者信息

Kocaman Veysel, Cheng Fu-Yuan, Bonis Julio, Raut Ganesh, Timsina Prem, Talby David, Kia Arash

机构信息

John Snow Labs Inc, Lewes, DE, United States.

Institute for Healthcare Delivery Science, Mount Sinai, New York, NY, United States.

出版信息

JMIR AI. 2025 Sep 5;4:e59251. doi: 10.2196/59251.

Abstract

BACKGROUND

Clinical notes house rich, yet unstructured, patient data, making analysis challenging due to medical jargon, abbreviations, and synonyms causing ambiguity. This complicates real-time extraction for decision support tools.

OBJECTIVE

This study aimed to examine the data curation, technology, and workflow of the named entity recognition (NER) pipeline, a component of a broader clinical decision support tool that identifies key entities using NER models and classifies these entities as present or absent in the patient through an NER assertion model.

METHODS

We gathered progress care, radiology, and pathology notes from 5000 patients, dividing them into 5 batches of 1000 patients each. Metrics such as notes and reports per patient, sentence count, token size, runtime, central processing unit, and memory use were measured per note type. We also evaluated the precision of the NER outputs and then the precision and recall of NER assertion models against manual annotations by a clinical expert.

RESULTS

Using Spark natural language processing clinical pretrained NER models on 138,250 clinical notes, we observed excellent NER precision, with a peak in procedures at 0.989 (95% CI 0.977-1.000) and an accuracy in the assertion model of 0.889 (95% CI 0.856-0.922). Our analysis highlighted long-tail distributions in notes per patient, note length, and entity density. Progress care notes had notably more entities per sentence than radiology and pathology notes, showing 4-fold and 16-fold differences, respectively.

CONCLUSIONS

Further research should explore the analysis of clinical notes beyond the scope of our study, including discharge summaries and psychiatric evaluation notes. Recognizing the unique linguistic characteristics of different note types underscores the importance of developing specialized NER models or natural language processing pipeline setups tailored to each type. By doing so, we can enhance their performance across a more diverse range of clinical scenarios.

摘要

背景

临床记录包含丰富但无结构化的患者数据,由于医学术语、缩写和同义词导致的歧义,使得分析具有挑战性。这使决策支持工具的实时提取变得复杂。

目的

本研究旨在检查命名实体识别(NER)管道的数据管理、技术和工作流程,NER管道是更广泛的临床决策支持工具的一个组件,该工具使用NER模型识别关键实体,并通过NER断言模型将这些实体分类为在患者中存在或不存在。

方法

我们收集了5000名患者的进展护理、放射学和病理学记录,将它们分成5批,每批1000名患者。针对每种记录类型测量了诸如每位患者的记录和报告数量、句子数量、令牌大小、运行时间、中央处理器和内存使用等指标。我们还评估了NER输出的精度,然后针对临床专家的手动注释评估了NER断言模型的精度和召回率。

结果

在138,250份临床记录上使用Spark自然语言处理临床预训练的NER模型,我们观察到NER具有出色的精度,手术方面的峰值为0.989(95%CI 0.977 - 1.000),断言模型的准确率为0.889(95%CI 0.856 - 0.922)。我们的分析突出了每位患者的记录、记录长度和实体密度的长尾分布。进展护理记录每句话中的实体明显多于放射学和病理学记录,分别显示出4倍和16倍的差异。

结论

进一步的研究应探索超出我们研究范围的临床记录分析,包括出院小结和精神科评估记录。认识到不同记录类型的独特语言特征强调了开发针对每种类型量身定制的专门NER模型或自然语言处理管道设置的重要性。通过这样做,我们可以在更多样化的临床场景中提高它们的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c858/12449662/eff5730cf361/ai_v4i1e59251_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验