• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

比较法语临床文本的命名实体识别方法,使用易于重用的管道。

Comparing NER Approaches on French Clinical Text, with Easy-to-Reuse Pipelines.

机构信息

Inria, HeKA, PariSanté Campus, Paris, France.

Centre de Recherche des Cordeliers, Inserm, Université Paris Cité, Sorbonne Université, France.

出版信息

Stud Health Technol Inform. 2024 Aug 22;316:272-276. doi: 10.3233/SHTI240396.

DOI:10.3233/SHTI240396
PMID:39176725
Abstract

The task of Named Entity Recognition (NER) is central for leveraging the content of clinical texts in observational studies. Indeed, texts contain a large part of the information available in Electronic Health Records (EHRs). However, clinical texts are highly heterogeneous between healthcare services and institutions, between countries and languages, making it hard to predict how existing tools may perform on a particular corpus. We compared four NER approaches on three French corpora and share our benchmarking pipeline in an open and easy-to-reuse manner, using the medkit Python library. We include in our pipelines fine-tuning operations with either one or several of the considered corpora. Our results illustrate the expected superiority of language models over a dictionary-based approach, and question the necessity of refining models already trained on biomedical texts. Beyond benchmarking, we believe sharing reusable and customizable pipelines for comparing fast-evolving Natural Language Processing (NLP) tools is a valuable contribution, since clinical texts themselves can hardly be shared for privacy concerns.

摘要

命名实体识别 (NER) 的任务对于利用观察性研究中临床文本的内容至关重要。事实上,文本包含电子健康记录 (EHR) 中可用信息的很大一部分。然而,临床文本在医疗保健服务和机构、国家和语言之间存在很大的异质性,因此很难预测现有工具在特定语料库上的表现如何。我们在三个法语语料库上比较了四种 NER 方法,并以开放且易于重用的方式共享我们的基准测试管道,使用 medkit Python 库。我们在管道中包括使用一个或多个考虑的语料库进行微调操作。我们的结果说明了语言模型相对于基于字典的方法的预期优势,并质疑对已经在生物医学文本上训练的模型进行细化的必要性。除了基准测试之外,我们还认为,共享可重复使用且可定制的用于比较快速发展的自然语言处理 (NLP) 工具的管道是一项有价值的贡献,因为出于隐私考虑,临床文本本身几乎无法共享。

相似文献

1
Comparing NER Approaches on French Clinical Text, with Easy-to-Reuse Pipelines.比较法语临床文本的命名实体识别方法,使用易于重用的管道。
Stud Health Technol Inform. 2024 Aug 22;316:272-276. doi: 10.3233/SHTI240396.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Healthcare workers' informal uses of mobile phones and other mobile devices to support their work: a qualitative evidence synthesis.医护人员非正规使用手机和其他移动设备来支持工作:定性证据综合评价。
Cochrane Database Syst Rev. 2024 Aug 27;8(8):CD015705. doi: 10.1002/14651858.CD015705.pub2.
4
Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study.西班牙电子健康记录中射血分数保留的心力衰竭症状检测语言模型的多标准优化:比较建模研究
J Med Internet Res. 2025 Jul 17;27:e76433. doi: 10.2196/76433.
5
De-identification of clinical free text using natural language processing: A systematic review of current approaches.使用自然语言处理对临床自由文本进行去识别化:当前方法的系统评价。
Artif Intell Med. 2024 May;151:102845. doi: 10.1016/j.artmed.2024.102845. Epub 2024 Mar 20.
6
Sexual Harassment and Prevention Training性骚扰与预防培训
7
Transformers for extracting breast cancer information from Spanish clinical narratives.从西班牙语临床叙述中提取乳腺癌信息的转换器。
Artif Intell Med. 2023 Sep;143:102625. doi: 10.1016/j.artmed.2023.102625. Epub 2023 Jul 13.
8
Natural language processing in medical text processing: A scoping literature review.
Int J Med Inform. 2025 Dec;204:106049. doi: 10.1016/j.ijmedinf.2025.106049. Epub 2025 Jul 17.
9
Artificial intelligence in healthcare text processing: a review applied to named entity recognition.医疗文本处理中的人工智能:应用于命名实体识别的综述
Front Artif Intell. 2025 Jul 7;8:1584203. doi: 10.3389/frai.2025.1584203. eCollection 2025.
10
Toward Cross-Hospital Deployment of Natural Language Processing Systems: Model Development and Validation of Fine-Tuned Large Language Models for Disease Name Recognition in Japanese.迈向自然语言处理系统的跨医院部署:用于日语疾病名称识别的微调大语言模型的模型开发与验证
JMIR Med Inform. 2025 Jul 8;13:e76773. doi: 10.2196/76773.

引用本文的文献

1
Facilitating phenotyping from clinical texts: the medkit library.助力从临床文本中进行表型分析:medkit库。
Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae681.