• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于GPT-4o的病史记录效率与质量比较:住院医师与人工智能的比较研究

Comparison of medical history documentation efficiency and quality based on GPT-4o: a study on the comparison between residents and artificial intelligence.

作者信息

Lu Xiaojing, Gao Xinqi, Wang Xinyi, Gong Zhenye, Cheng Jie, Hu Weiguo, Wu Shaun, Wang Rong, Li Xiaoyang

机构信息

Department of Medical Education, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.

WORK Medical Technology Group LTD, Hangzhou, China.

出版信息

Front Med (Lausanne). 2025 May 14;12:1545730. doi: 10.3389/fmed.2025.1545730. eCollection 2025.

DOI:10.3389/fmed.2025.1545730
PMID:40438356
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12116629/
Abstract

BACKGROUND

As medical technology advances, physicians' responsibilities in clinical practice continue to increase, with medical history documentation becoming an essential component. Artificial Intelligence (AI) technologies, particularly advances in Natural Language Processing (NLP), have introduced new possibilities for medical documentation. This study aims to evaluate the efficiency and quality of medical history documentation by ChatGPT-4o compared to resident physicians and explore the potential applications of AI in clinical documentation.

METHODS

Using a non-inferiority design, this study compared the documentation time and quality scores between 5 resident physicians from the hematology department (with an average of 2.4 years of clinical experience) and ChatGPT-4o based on identical case materials. Medical history quality was evaluated by two attending physicians with over 10 years of clinical experience using ten case content criteria. Data were analyzed using paired tests and Wilcoxon signed-rank tests, with Kappa coefficients used to assess scoring consistency. Detailed scoring criteria included completeness (coverage of history elements), accuracy (correctness of information), logic (organization and coherence of content), and professionalism (appropriate use of medical terminology and format), each rated on a 10-point scale.

RESULTS

In terms of medical history quality, ChatGPT-4o achieved an average score of 88.9, while resident physicians scored 89.6, with no statistically significant difference between the two ( = 0.25). The Kappa coefficient between the two evaluators was 0.82, indicating good consistency in scoring. Non-inferiority testing showed that ChatGPT-4o's quality scores fell within the preset non-inferiority margin (5 points), indicating that its documentation quality was not inferior to that of resident physicians. ChatGPT-4o's average documentation time was 40.1 s, significantly shorter than the resident physicians' average of 14.9 min ( < 0.001).

CONCLUSION

While maintaining quality comparable to resident physicians, ChatGPT-4o significantly reduced the time required for medical history documentation. Despite these positive results, practical considerations such as data preprocessing, data security, and privacy protection must be addressed in real-world applications. Future research should further explore ChatGPT-4o's capabilities in handling complex cases and its applicability across different clinical settings.

摘要

背景

随着医学技术的进步,医生在临床实践中的责任不断增加,病史记录成为一个重要组成部分。人工智能(AI)技术,特别是自然语言处理(NLP)的进展,为医学记录带来了新的可能性。本研究旨在评估ChatGPT-4o与住院医师相比在病史记录方面的效率和质量,并探索AI在临床记录中的潜在应用。

方法

本研究采用非劣效性设计,基于相同的病例材料,比较了血液科5名住院医师(平均临床经验2.4年)和ChatGPT-4o之间的记录时间和质量得分。由两名具有超过10年临床经验的主治医师根据十个病例内容标准评估病史质量。使用配对检验和Wilcoxon符号秩检验分析数据,使用Kappa系数评估评分一致性。详细的评分标准包括完整性(病史要素的覆盖范围)、准确性(信息的正确性)、逻辑性(内容的组织和连贯性)和专业性(医学术语和格式的恰当使用),每项按10分制评分。

结果

在病史质量方面,ChatGPT-4o的平均得分为88.9分,而住院医师的得分为89.6分,两者之间无统计学显著差异( = 0.25)。两名评估者之间的Kappa系数为0.82,表明评分具有良好的一致性。非劣效性检验表明,ChatGPT-4o的质量得分落在预设的非劣效界值(5分)内,表明其记录质量不低于住院医师。ChatGPT-4o的平均记录时间为40.1秒,显著短于住院医师的平均14.9分钟( < 0.001)。

结论

ChatGPT-4o在保持与住院医师相当的质量的同时,显著减少了病史记录所需的时间。尽管取得了这些积极成果,但在实际应用中必须解决数据预处理、数据安全和隐私保护等实际问题。未来的研究应进一步探索ChatGPT-4o处理复杂病例的能力及其在不同临床环境中的适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6ba/12116629/32fe3f563650/fmed-12-1545730-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6ba/12116629/4cffd3e04eaa/fmed-12-1545730-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6ba/12116629/32fe3f563650/fmed-12-1545730-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6ba/12116629/4cffd3e04eaa/fmed-12-1545730-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6ba/12116629/32fe3f563650/fmed-12-1545730-g0002.jpg

相似文献

1
Comparison of medical history documentation efficiency and quality based on GPT-4o: a study on the comparison between residents and artificial intelligence.基于GPT-4o的病史记录效率与质量比较:住院医师与人工智能的比较研究
Front Med (Lausanne). 2025 May 14;12:1545730. doi: 10.3389/fmed.2025.1545730. eCollection 2025.
2
AI-powered standardised patients: evaluating ChatGPT-4o's impact on clinical case management in intern physicians.人工智能驱动的标准化病人:评估ChatGPT-4o对实习医生临床病例管理的影响。
BMC Med Educ. 2025 Feb 20;25(1):278. doi: 10.1186/s12909-025-06877-6.
3
Patient Triage and Guidance in Emergency Departments Using Large Language Models: Multimetric Study.使用大语言模型在急诊科进行患者分诊和指导:多指标研究
J Med Internet Res. 2025 May 15;27:e71613. doi: 10.2196/71613.
4
ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.ChatGPT-4 在 USMLE 学科和临床技能中的全能表现:比较分析。
JMIR Med Educ. 2024 Nov 6;10:e63430. doi: 10.2196/63430.
5
Assessing the accuracy and clinical utility of GPT-4O in abnormal blood cell morphology recognition.评估GPT-4O在异常血细胞形态识别中的准确性和临床效用。
Digit Health. 2024 Nov 5;10:20552076241298503. doi: 10.1177/20552076241298503. eCollection 2024 Jan-Dec.
6
An assessment of ChatGPT in error detection for thyroid ultrasound reports: A comparative study with ultrasound physicians.ChatGPT在甲状腺超声报告错误检测中的评估:与超声医师的对比研究
Digit Health. 2025 Mar 13;11:20552076251326019. doi: 10.1177/20552076251326019. eCollection 2025 Jan-Dec.
7
Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists.生成式预训练变换器4o(GPT-4o)用于解答欧洲放射学文凭(EDiR)基于文本的多项选择题:与放射科医生的对比研究
Insights Imaging. 2025 Mar 22;16(1):66. doi: 10.1186/s13244-025-01941-7.
8
Assessing ChatGPT for Clinical Decision-Making in Radiation Oncology, With Open-Ended Questions and Images.通过开放式问题和图像评估ChatGPT在放射肿瘤学临床决策中的应用
Pract Radiat Oncol. 2025 Apr 29. doi: 10.1016/j.prro.2025.04.009.
9
GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study.GPT-4o 在回答模拟的欧洲介入放射学委员会考试方面的能力与德国医学生和专家相比,以及其在介入放射学方面生成考试项目的能力:一项描述性研究。
J Educ Eval Health Prof. 2024;21:21. doi: 10.3352/jeehp.2024.21.21. Epub 2024 Aug 20.
10
Exploring ChatGPT's potential for augmenting post-editing in machine translation across multiple domains: challenges and opportunities.探索ChatGPT在多个领域增强机器翻译后编辑方面的潜力:挑战与机遇。
Front Artif Intell. 2025 May 1;8:1526293. doi: 10.3389/frai.2025.1526293. eCollection 2025.

本文引用的文献

1
Basal knowledge in the field of pediatric nephrology and its enhancement following specific training of ChatGPT-4 "omni" and Gemini 1.5 Flash.儿科肾脏病学领域的基础知识及其在 ChatGPT-4“全能”和 Gemini 1.5 Flash 特定培训后的增强。
Pediatr Nephrol. 2025 Jan;40(1):151-157. doi: 10.1007/s00467-024-06486-3. Epub 2024 Aug 16.
2
Evaluating the Effectiveness of a Generative Pretrained Transformer-Based Dietary Recommendation System in Managing Potassium Intake for Hemodialysis Patients.评估基于生成式预训练转换器的膳食推荐系统在管理血液透析患者钾摄入方面的有效性。
J Ren Nutr. 2024 Nov;34(6):539-545. doi: 10.1053/j.jrn.2024.04.001. Epub 2024 Apr 12.
3
Ethical implications of AI and robotics in healthcare: A review.
人工智能和机器人技术在医疗保健中的伦理问题:综述。
Medicine (Baltimore). 2023 Dec 15;102(50):e36671. doi: 10.1097/MD.0000000000036671.
4
ChatGPT Performs on the Chinese National Medical Licensing Examination.ChatGPT 通过中国医师资格考试。
J Med Syst. 2023 Aug 15;47(1):86. doi: 10.1007/s10916-023-01961-0.
5
AI ethics with Chinese characteristics? Concerns and preferred solutions in Chinese academia.具有中国特色的人工智能伦理?中国学术界的担忧与首选解决方案
AI Soc. 2022 Oct 17:1-14. doi: 10.1007/s00146-022-01578-w.
6
Ethical principles for artificial intelligence in education.教育领域人工智能的伦理原则。
Educ Inf Technol (Dordr). 2023;28(4):4221-4241. doi: 10.1007/s10639-022-11316-w. Epub 2022 Oct 13.
7
AI in health and medicine.人工智能在医疗中的应用。
Nat Med. 2022 Jan;28(1):31-38. doi: 10.1038/s41591-021-01614-0. Epub 2022 Jan 20.
8
Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: the STARD-AI protocol.制定以人工智能为中心的诊断性试验准确性研究报告规范:STARD-AI 协议。
BMJ Open. 2021 Jun 28;11(6):e047709. doi: 10.1136/bmjopen-2020-047709.
9
Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension.涉及人工智能干预的临床试验报告的报告规范:CONSORT-AI 扩展。
Lancet Digit Health. 2020 Oct;2(10):e537-e548. doi: 10.1016/S2589-7500(20)30218-1. Epub 2020 Sep 9.
10
Optimal Use of the Non-Inferiority Trial Design.最优非劣效临床试验设计的应用。
Pharmaceut Med. 2020 Jun;34(3):159-165. doi: 10.1007/s40290-020-00334-z.