• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用大语言模型在非英语的基于文本的非结构化电子健康记录中模拟领域专家标注。

Leveraging large language models to mimic domain expert labeling in unstructured text-based electronic healthcare records in non-english languages.

作者信息

Akbasli Izzet Turkalp, Birbilen Ahmet Ziya, Teksam Ozlem

机构信息

Division of Pediatric Emergency, Department of Pediatrics, Faculty of Medicine, Hacettepe University, Ankara, Turkey.

Life Support Center, Digital Health and Artificial Intelligence on Critical Care, Hacettepe University, Ankara, Turkey.

出版信息

BMC Med Inform Decis Mak. 2025 Mar 31;25(1):154. doi: 10.1186/s12911-025-02871-6.

DOI:10.1186/s12911-025-02871-6
PMID:40165165
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11959812/
Abstract

BACKGROUND

The integration of big data and artificial intelligence (AI) in healthcare, particularly through the analysis of electronic health records (EHR), presents significant opportunities for improving diagnostic accuracy and patient outcomes. However, the challenge of processing and accurately labeling vast amounts of unstructured data remains a critical bottleneck, necessitating efficient and reliable solutions. This study investigates the ability of domain specific, fine-tuned large language models (LLMs) to classify unstructured EHR texts with typographical errors through named entity recognition tasks, aiming to improve the efficiency and reliability of supervised learning AI models in healthcare.

METHODS

Turkish clinical notes from pediatric emergency room admissions at Hacettepe University İhsan Doğramacı Children's Hospital from 2018 to 2023 were analyzed. The data were preprocessed with open source Python libraries and categorized using a pretrained GPT-3 model, "text-davinci-003," before and after fine-tuning with domain-specific data on respiratory tract infections (RTI). The model's predictions were compared against ground truth labels established by pediatric specialists.

RESULTS

Out of 24,229 patient records classified as poorly labeled, 18,879 were identified without typographical errors and confirmed for RTI through filtering methods. The fine-tuned model achieved a 99.88% accuracy, significantly outperforming the pretrained model's 78.54% accuracy in identifying RTI cases among the remaining records. The fine-tuned model demonstrated superior performance metrics across all evaluated aspects compared to the pretrained model.

CONCLUSIONS

Fine-tuned LLMs can categorize unstructured EHR data with high accuracy, closely approximating the performance of domain experts. This approach significantly reduces the time and costs associated with manual data labeling, demonstrating the potential to streamline the processing of large-scale healthcare data for AI applications.

摘要

背景

大数据和人工智能(AI)在医疗保健领域的整合,特别是通过电子健康记录(EHR)分析,为提高诊断准确性和患者治疗效果带来了重大机遇。然而,处理和准确标记大量非结构化数据的挑战仍然是一个关键瓶颈,需要高效且可靠的解决方案。本研究调查特定领域的微调大语言模型(LLMs)通过命名实体识别任务对存在排版错误的非结构化EHR文本进行分类的能力,旨在提高医疗保健中监督学习AI模型的效率和可靠性。

方法

对2018年至2023年在哈杰泰佩大学伊赫桑·多格拉马西儿童医院儿科急诊室入院的土耳其语临床记录进行分析。使用开源Python库对数据进行预处理,并在使用呼吸道感染(RTI)的特定领域数据进行微调之前和之后,使用预训练的GPT-3模型“text-davinci-003”进行分类。将模型的预测结果与儿科专家确定的真实标签进行比较。

结果

在24229份分类为标签不佳的患者记录中,有18879份被确定没有排版错误,并通过过滤方法确认为RTI。微调后的模型准确率达到99.88%,在识别其余记录中的RTI病例方面显著优于预训练模型的78.54%准确率。与预训练模型相比,微调后的模型在所有评估方面都表现出卓越的性能指标。

结论

微调后的LLMs可以高精度地对非结构化EHR数据进行分类,性能与领域专家相近。这种方法显著减少了与人工数据标记相关的时间和成本,显示出为AI应用简化大规模医疗保健数据处理的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/46c6/11959812/dea36f98f514/12911_2025_2871_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/46c6/11959812/dea36f98f514/12911_2025_2871_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/46c6/11959812/dea36f98f514/12911_2025_2871_Fig1_HTML.jpg

相似文献

1
Leveraging large language models to mimic domain expert labeling in unstructured text-based electronic healthcare records in non-english languages.利用大语言模型在非英语的基于文本的非结构化电子健康记录中模拟领域专家标注。
BMC Med Inform Decis Mak. 2025 Mar 31;25(1):154. doi: 10.1186/s12911-025-02871-6.
2
Using Synthetic Health Care Data to Leverage Large Language Models for Named Entity Recognition: Development and Validation Study.利用合成医疗保健数据借助大语言模型进行命名实体识别:开发与验证研究。
J Med Internet Res. 2025 Mar 18;27:e66279. doi: 10.2196/66279.
3
Scalable information extraction from free text electronic health records using large language models.使用大语言模型从自由文本电子健康记录中进行可扩展的信息提取。
BMC Med Res Methodol. 2025 Jan 28;25(1):23. doi: 10.1186/s12874-025-02470-z.
4
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
5
Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study.用于心理健康预测模型的电子健康记录中非结构化文本分类:大语言模型评估研究
JMIR Med Inform. 2025 Jan 21;13:e65454. doi: 10.2196/65454.
6
A large language model-based generative natural language processing framework fine-tuned on clinical notes accurately extracts headache frequency from electronic health records.基于大型语言模型的生成式自然语言处理框架,在临床笔记上进行了微调,能够从电子健康记录中准确提取头痛频率。
Headache. 2024 Apr;64(4):400-409. doi: 10.1111/head.14702. Epub 2024 Mar 25.
7
MLM-based typographical error correction of unstructured medical texts for named entity recognition.基于 MLM 的非结构化医疗文本命名实体识别的排版错误校正。
BMC Bioinformatics. 2022 Nov 16;23(1):486. doi: 10.1186/s12859-022-05035-9.
8
Evaluating the accuracy of a state-of-the-art large language model for prediction of admissions from the emergency room.评估最先进的大型语言模型在预测急诊入院方面的准确性。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1921-1928. doi: 10.1093/jamia/ocae103.
9
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用:范围综述
JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.
10
Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study.揭开高级人工智能语言模型在去识别汉英混合临床文本背后的秘密:开发与验证研究。
J Med Internet Res. 2024 Jan 25;26:e48443. doi: 10.2196/48443.

引用本文的文献

1
Machine Learning-Powered Smart Healthcare Systems in the Era of Big Data: Applications, Diagnostic Insights, Challenges, and Ethical Implications.大数据时代基于机器学习的智能医疗系统:应用、诊断见解、挑战及伦理影响
Diagnostics (Basel). 2025 Jul 30;15(15):1914. doi: 10.3390/diagnostics15151914.

本文引用的文献

1
Toward expert-level medical question answering with large language models.迈向使用大语言模型实现专家级医学问答
Nat Med. 2025 Mar;31(3):943-950. doi: 10.1038/s41591-024-03423-7. Epub 2025 Jan 8.
2
The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study.GPT-3 人工智能模型的诊断和分诊准确性:一项观察性研究。
Lancet Digit Health. 2024 Aug;6(8):e555-e561. doi: 10.1016/S2589-7500(24)00097-9.
3
Large language models in health care: Development, applications, and challenges.医疗保健领域的大语言模型:发展、应用与挑战。
Health Care Sci. 2023 Jul 24;2(4):255-263. doi: 10.1002/hcs2.61. eCollection 2023 Aug.
4
Almanac - Retrieval-Augmented Language Models for Clinical Medicine.用于临床医学的年鉴检索增强语言模型。
NEJM AI. 2024 Feb;1(2). doi: 10.1056/aioa2300068. Epub 2024 Jan 25.
5
Opportunities and challenges for ChatGPT and large language models in biomedicine and health.ChatGPT 和大型语言模型在生物医学和健康领域的机遇与挑战。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad493.
6
Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review.使用机器学习方法进行自然语言处理,以分析来自电子健康记录的非结构化患者报告结局:系统评价。
Artif Intell Med. 2023 Dec;146:102701. doi: 10.1016/j.artmed.2023.102701. Epub 2023 Nov 1.
7
Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination.评估 GPT-3.5 和 GPT-4 在波兰医学期末考试中的表现。
Sci Rep. 2023 Nov 22;13(1):20512. doi: 10.1038/s41598-023-46995-z.
8
Assessment of ChatGPT in the Prehospital Management of Ophthalmological Emergencies - An Analysis of 10 Fictional Case Vignettes.ChatGPT在眼科急诊院前管理中的评估——对10个虚构病例 vignettes的分析
Klin Monbl Augenheilkd. 2024 May;241(5):675-681. doi: 10.1055/a-2149-0447. Epub 2023 Oct 27.
9
A vignette-based evaluation of ChatGPT's ability to provide appropriate and equitable medical advice across care contexts.基于案例的评估:ChatGPT 在跨护理环境下提供适当和公平的医疗建议的能力。
Sci Rep. 2023 Oct 19;13(1):17885. doi: 10.1038/s41598-023-45223-y.
10
Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study.Ada 健康和 WebMD 症状检查器、ChatGPT 和医生对急诊科患者的诊断和分诊准确性比较:临床数据分析研究。
JMIR Mhealth Uhealth. 2023 Oct 3;11:e49995. doi: 10.2196/49995.