• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生成式预训练变换器4(GPT-4)分析三种不同语言医学笔记的潜力:一项回顾性模型评估研究。

The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study.

作者信息

Menezes Maria Clara Saad, Hoffmann Alexander F, Tan Amelia L M, Nalbandyan Mariné, Omenn Gilbert S, Mazzotti Diego R, Hernández-Arango Alejandro, Visweswaran Shyam, Venkatesh Shruthi, Mandl Kenneth D, Bourgeois Florence T, Lee James W K, Makmur Andrew, Hanauer David A, Semanik Michael G, Kerivan Lauren T, Hill Terra, Forero Julian, Restrepo Carlos, Vigna Matteo, Ceriana Piero, Abu-El-Rub Noor, Avillach Paul, Bellazzi Riccardo, Callaci Thomas, Gutiérrez-Sacristán Alba, Malovini Alberto, Mathew Jomol P, Morris Michele, Murthy Venkatesh L, Buonocore Tommaso M, Parimbelli Enea, Patel Lav P, Sáez Carlos, Samayamuthu Malarkodi Jebathilagam, Thompson Jeffrey A, Tibollo Valentina, Xia Zongqi, Kohane Isaac S

机构信息

Department of Biomedical Informatics, Medical School, Harvard University, Boston, MA, USA; Department of Internal Medicine, University of Texas at Southwestern, Dallas, TX, USA.

Department of Biomedical Informatics, Medical School, Harvard University, Boston, MA, USA.

出版信息

Lancet Digit Health. 2025 Jan;7(1):e35-e43. doi: 10.1016/S2589-7500(24)00246-2.

DOI:10.1016/S2589-7500(24)00246-2
PMID:39722251
Abstract

BACKGROUND

Patient notes contain substantial information but are difficult for computers to analyse due to their unstructured format. Large-language models (LLMs), such as Generative Pre-trained Transformer 4 (GPT-4), have changed our ability to process text, but we do not know how effectively they handle medical notes. We aimed to assess the ability of GPT-4 to answer predefined questions after reading medical notes in three different languages.

METHODS

For this retrospective model-evaluation study, we included eight university hospitals from four countries (ie, the USA, Colombia, Singapore, and Italy). Each site submitted seven de-identified medical notes related to seven separate patients to the coordinating centre between June 1, 2023, and Feb 28, 2024. Medical notes were written between Feb 1, 2020, and June 1, 2023. One site provided medical notes in Spanish, one site provided notes in Italian, and the remaining six sites provided notes in English. We included admission notes, progress notes, and consultation notes. No discharge summaries were included in this study. We advised participating sites to choose medical notes that, at time of hospital admission, were for patients who were male or female, aged 18-65 years, had a diagnosis of obesity, had a diagnosis of COVID-19, and had submitted an admission note. Adherence to these criteria was optional and participating sites randomly chose which medical notes to submit. When entering information into GPT-4, we prepended each medical note with an instruction prompt and a list of 14 questions that had been chosen a priori. Each medical note was individually given to GPT-4 in its original language and in separate sessions; the questions were always given in English. At each site, two physicians independently validated responses by GPT-4 and responded to all 14 questions. Each pair of physicians evaluated responses from GPT-4 to the seven medical notes from their own site only. Physicians were not masked to responses from GPT-4 before providing their own answers, but were masked to responses from the other physician.

FINDINGS

We collected 56 medical notes, of which 42 (75%) were in English, seven (13%) were in Italian, and seven (13%) were in Spanish. For each medical note, GPT-4 responded to 14 questions, resulting in 784 responses. In 622 (79%, 95% CI 76-82) of 784 responses, both physicians agreed with GPT-4. In 82 (11%, 8-13) responses, only one physician agreed with GPT-4. In the remaining 80 (10%, 8-13) responses, neither physician agreed with GPT-4. Both physicians agreed with GPT-4 more often for medical notes written in Spanish (86 [88%, 95% CI 79-93] of 98 responses) and Italian (82 [84%, 75-90] of 98 responses) than in English (454 [77%, 74-80] of 588 responses).

INTERPRETATION

The results of our model-evaluation study suggest that GPT-4 is accurate when analysing medical notes in three different languages. In the future, research should explore how LLMs can be integrated into clinical workflows to maximise their use in health care.

FUNDING

None.

摘要

背景

患者记录包含大量信息,但由于其非结构化格式,计算机难以分析。大型语言模型(LLMs),如生成式预训练变换器4(GPT-4),改变了我们处理文本的能力,但我们不知道它们处理医疗记录的效果如何。我们旨在评估GPT-4在阅读三种不同语言的医疗记录后回答预定义问题的能力。

方法

对于这项回顾性模型评估研究,我们纳入了来自四个国家(即美国、哥伦比亚、新加坡和意大利)的八所大学医院。每个站点在2023年6月1日至2024年2月28日期间向协调中心提交了与七名不同患者相关的七份去识别化医疗记录。医疗记录的撰写时间为2020年2月1日至2023年6月1日。一个站点提供西班牙语的医疗记录,一个站点提供意大利语的记录,其余六个站点提供英语的记录。我们纳入了入院记录、病程记录和会诊记录。本研究未包括出院小结。我们建议参与站点选择在医院入院时针对年龄在18至65岁、患有肥胖症、患有2019冠状病毒病(COVID-19)且已提交入院记录的男性或女性患者的医疗记录。是否遵循这些标准是可选的,参与站点随机选择要提交的医疗记录。在将信息输入GPT-4时,我们在每份医疗记录前加上一个指令提示和一份事先选定的14个问题的列表。每份医疗记录以其原始语言并在单独的会话中分别提供给GPT-4;问题始终以英语给出。在每个站点,两名医生独立验证GPT-4的回答并回答所有14个问题。每对医生仅评估GPT-4对来自他们自己站点的七份医疗记录的回答。医生在给出自己的答案之前未对GPT-4 的回答进行盲法处理,但对另一位医生的回答进行了盲法处理。

结果

我们收集了56份医疗记录,其中42份(75%)是英语的,7份(13%)是意大利语的,7份(13%)是西班牙语的。对于每份医疗记录,GPT-4回答了14个问题,共产生784个回答。在784个回答中,622个(79%,95%CI 76 - 82)回答两名医生都与GPT-4意见一致。在82个(11%,8 - 13)回答中,只有一名医生与GPT-4意见一致。在其余80个(10%,8 - 13)回答中,两名医生都不同意GPT-4的回答。与英语医疗记录(588个回答中的454个[77%,74 - 80])相比,两名医生对西班牙语(98个回答中的86个[88%,95%CI 79 - 93])和意大利语(98个回答中的82个[84%,75 - 90])医疗记录与GPT-4意见一致的情况更多。

解读

我们的模型评估研究结果表明,GPT-4在分析三种不同语言的医疗记录时是准确的。未来,研究应探索如何将大型语言模型整合到临床工作流程中,以最大限度地在医疗保健中使用它们。

资金来源

无。

相似文献

1
The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study.生成式预训练变换器4(GPT-4)分析三种不同语言医学笔记的潜力:一项回顾性模型评估研究。
Lancet Digit Health. 2025 Jan;7(1):e35-e43. doi: 10.1016/S2589-7500(24)00246-2.
2
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
3
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
4
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
5
Sertindole for schizophrenia.用于治疗精神分裂症的舍吲哚。
Cochrane Database Syst Rev. 2005 Jul 20;2005(3):CD001715. doi: 10.1002/14651858.CD001715.pub2.
6
Maternal and neonatal outcomes of elective induction of labor.择期引产的母婴结局
Evid Rep Technol Assess (Full Rep). 2009 Mar(176):1-257.
7
Immunogenicity and seroefficacy of pneumococcal conjugate vaccines: a systematic review and network meta-analysis.肺炎球菌结合疫苗的免疫原性和血清效力:系统评价和网络荟萃分析。
Health Technol Assess. 2024 Jul;28(34):1-109. doi: 10.3310/YWHA3079.
8
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.
9
Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.静脉注射硫酸镁和索他洛尔预防冠状动脉搭桥术后房颤:系统评价与经济学评估
Health Technol Assess. 2008 Jun;12(28):iii-iv, ix-95. doi: 10.3310/hta12280.
10
Shared decision-making for people with asthma.哮喘患者的共同决策
Cochrane Database Syst Rev. 2017 Oct 3;10(10):CD012330. doi: 10.1002/14651858.CD012330.pub2.

引用本文的文献

1
Exploring the use of large language models for classification, clinical interpretation, and treatment recommendation in breast tumor patient records.探索大语言模型在乳腺肿瘤患者记录的分类、临床解读及治疗推荐中的应用。
Sci Rep. 2025 Aug 26;15(1):31450. doi: 10.1038/s41598-025-16999-y.
2
Prompt injection attacks on vision-language models for surgical decision support.针对用于手术决策支持的视觉语言模型的提示注入攻击。
medRxiv. 2025 Jul 23:2025.07.16.25331645. doi: 10.1101/2025.07.16.25331645.
3
Train-Time and Test-Time Computation in Large Language Models for Error Detection and Correction in Electronic Medical Records: A Retrospective Study.

本文引用的文献

1
Retrieving Evidence from EHRs with LLMs: Possibilities and Challenges.利用大语言模型从电子健康记录中检索证据:可能性与挑战。
Proc Mach Learn Res. 2024 Jun;248:489-505.
2
Evaluation and mitigation of the limitations of large language models in clinical decision-making.评估和缓解大型语言模型在临床决策中的局限性。
Nat Med. 2024 Sep;30(9):2613-2622. doi: 10.1038/s41591-024-03097-1. Epub 2024 Jul 4.
3
Adapted large language models can outperform medical experts in clinical text summarization.经过改编的大型语言模型在临床文本总结方面的表现优于医学专家。
用于电子病历错误检测与纠正的大语言模型中的训练时和测试时计算:一项回顾性研究
Diagnostics (Basel). 2025 Jul 21;15(14):1829. doi: 10.3390/diagnostics15141829.
4
Evaluating the Performance of ChatGPT on Board-Style Examination Questions in Ophthalmology: A Meta-Analysis.评估ChatGPT在眼科板型考试问题上的表现:一项荟萃分析。
J Med Syst. 2025 Jul 5;49(1):94. doi: 10.1007/s10916-025-02227-7.
5
Leveraging large language models for accurate classification of liver lesions from MRI reports.利用大语言模型对MRI报告中的肝脏病变进行准确分类。
Comput Struct Biotechnol J. 2025 May 21;27:2139-2146. doi: 10.1016/j.csbj.2025.05.019. eCollection 2025.
6
Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases.GPT-4o在九种语言和4967个病例的罕见病诊断中表现一致。
medRxiv. 2025 Feb 28:2025.02.26.25322769. doi: 10.1101/2025.02.26.25322769.
Nat Med. 2024 Apr;30(4):1134-1142. doi: 10.1038/s41591-024-02855-5. Epub 2024 Feb 27.
4
Large language models to identify social determinants of health in electronic health records.利用大语言模型识别电子健康记录中的健康社会决定因素。
NPJ Digit Med. 2024 Jan 11;7(1):6. doi: 10.1038/s41746-023-00970-0.
5
Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study.评估 GPT-4 在医疗保健中延续种族和性别偏见的潜力:一项模型评估研究。
Lancet Digit Health. 2024 Jan;6(1):e12-e22. doi: 10.1016/S2589-7500(23)00225-X.
6
The shaky foundations of large language models and foundation models for electronic health records.用于电子健康记录的大语言模型和基础模型的不稳定基础。
NPJ Digit Med. 2023 Jul 29;6(1):135. doi: 10.1038/s41746-023-00879-8.
7
Electronic Health Records: The Signal and the Noise.电子健康记录:信号与噪音
Med Decis Making. 2021 Feb;41(2):103-106. doi: 10.1177/0272989X20985764.
8
Assessment of Electronic Health Record Use Between US and Non-US Health Systems.评估美国与非美国医疗体系间电子健康记录的使用情况。
JAMA Intern Med. 2021 Feb 1;181(2):251-259. doi: 10.1001/jamainternmed.2020.7071.
9
Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review.慢性病临床记录的自然语言处理:系统综述
JMIR Med Inform. 2019 Apr 27;7(2):e12239. doi: 10.2196/12239.
10
A systematic review of barriers to data sharing in public health.一项关于公共卫生领域数据共享障碍的系统综述。
BMC Public Health. 2014 Nov 5;14:1144. doi: 10.1186/1471-2458-14-1144.