基于GPT-4o的病史记录效率与质量比较：住院医师与人工智能的比较研究

Comparison of medical history documentation efficiency and quality based on GPT-4o: a study on the comparison between residents and artificial intelligence.

作者信息

Lu Xiaojing, Gao Xinqi, Wang Xinyi, Gong Zhenye, Cheng Jie, Hu Weiguo, Wu Shaun, Wang Rong, Li Xiaoyang

机构信息

Department of Medical Education, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.

WORK Medical Technology Group LTD, Hangzhou, China.

出版信息

Front Med (Lausanne). 2025 May 14;12:1545730. doi: 10.3389/fmed.2025.1545730. eCollection 2025.

DOI:10.3389/fmed.2025.1545730

PMID:40438356

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12116629/

Abstract

BACKGROUND

As medical technology advances, physicians' responsibilities in clinical practice continue to increase, with medical history documentation becoming an essential component. Artificial Intelligence (AI) technologies, particularly advances in Natural Language Processing (NLP), have introduced new possibilities for medical documentation. This study aims to evaluate the efficiency and quality of medical history documentation by ChatGPT-4o compared to resident physicians and explore the potential applications of AI in clinical documentation.

METHODS

Using a non-inferiority design, this study compared the documentation time and quality scores between 5 resident physicians from the hematology department (with an average of 2.4 years of clinical experience) and ChatGPT-4o based on identical case materials. Medical history quality was evaluated by two attending physicians with over 10 years of clinical experience using ten case content criteria. Data were analyzed using paired tests and Wilcoxon signed-rank tests, with Kappa coefficients used to assess scoring consistency. Detailed scoring criteria included completeness (coverage of history elements), accuracy (correctness of information), logic (organization and coherence of content), and professionalism (appropriate use of medical terminology and format), each rated on a 10-point scale.

RESULTS

In terms of medical history quality, ChatGPT-4o achieved an average score of 88.9, while resident physicians scored 89.6, with no statistically significant difference between the two ( = 0.25). The Kappa coefficient between the two evaluators was 0.82, indicating good consistency in scoring. Non-inferiority testing showed that ChatGPT-4o's quality scores fell within the preset non-inferiority margin (5 points), indicating that its documentation quality was not inferior to that of resident physicians. ChatGPT-4o's average documentation time was 40.1 s, significantly shorter than the resident physicians' average of 14.9 min ( < 0.001).

CONCLUSION

While maintaining quality comparable to resident physicians, ChatGPT-4o significantly reduced the time required for medical history documentation. Despite these positive results, practical considerations such as data preprocessing, data security, and privacy protection must be addressed in real-world applications. Future research should further explore ChatGPT-4o's capabilities in handling complex cases and its applicability across different clinical settings.

摘要

背景

随着医学技术的进步，医生在临床实践中的责任不断增加，病史记录成为一个重要组成部分。人工智能（AI）技术，特别是自然语言处理（NLP）的进展，为医学记录带来了新的可能性。本研究旨在评估ChatGPT-4o与住院医师相比在病史记录方面的效率和质量，并探索AI在临床记录中的潜在应用。

方法

本研究采用非劣效性设计，基于相同的病例材料，比较了血液科5名住院医师（平均临床经验2.4年）和ChatGPT-4o之间的记录时间和质量得分。由两名具有超过10年临床经验的主治医师根据十个病例内容标准评估病史质量。使用配对检验和Wilcoxon符号秩检验分析数据，使用Kappa系数评估评分一致性。详细的评分标准包括完整性（病史要素的覆盖范围）、准确性（信息的正确性）、逻辑性（内容的组织和连贯性）和专业性（医学术语和格式的恰当使用），每项按10分制评分。

结果

在病史质量方面，ChatGPT-4o的平均得分为88.9分，而住院医师的得分为89.6分，两者之间无统计学显著差异（ = 0.25）。两名评估者之间的Kappa系数为0.82，表明评分具有良好的一致性。非劣效性检验表明，ChatGPT-4o的质量得分落在预设的非劣效界值（5分）内，表明其记录质量不低于住院医师。ChatGPT-4o的平均记录时间为40.1秒，显著短于住院医师的平均14.9分钟（ < 0.001）。

结论

ChatGPT-4o在保持与住院医师相当的质量的同时，显著减少了病史记录所需的时间。尽管取得了这些积极成果，但在实际应用中必须解决数据预处理、数据安全和隐私保护等实际问题。未来的研究应进一步探索ChatGPT-4o处理复杂病例的能力及其在不同临床环境中的适用性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于GPT-4o的病史记录效率与质量比较：住院医师与人工智能的比较研究

Comparison of medical history documentation efficiency and quality based on GPT-4o: a study on the comparison between residents and artificial intelligence.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

本文引用的文献

基于GPT-4o的病史记录效率与质量比较：住院医师与人工智能的比较研究

Comparison of medical history documentation efficiency and quality based on GPT-4o: a study on the comparison between residents and artificial intelligence.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

本文引用的文献