• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用ChatGPT模型评估核酸检测报告中的错误检测和治疗建议

Evaluation of error detection and treatment recommendations in nucleic acid test reports using ChatGPT models.

作者信息

Han Wenzheng, Wan Chao, Shan Rui, Xu Xudong, Chen Guang, Zhou Wenjie, Yang Yuxuan, Feng Gang, Li Xiaoning, Yang Jianghua, Jin Kai, Chen Qing

机构信息

The First Affiliated Hospital, Wannan Medical College, Wuhu, Anhui, China.

Wannan Medical College, Wuhu, Anhui, China.

出版信息

Clin Chem Lab Med. 2025 Apr 21. doi: 10.1515/cclm-2025-0089.

DOI:10.1515/cclm-2025-0089
PMID:40249886
Abstract

OBJECTIVES

Accurate medical laboratory reports are essential for delivering high-quality healthcare. Recently, advanced artificial intelligence models, such as those in the ChatGPT series, have shown considerable promise in this domain. This study assessed the performance of specific GPT models-namely, 4o, o1, and o1 mini-in identifying errors within medical laboratory reports and in providing treatment recommendations.

METHODS

In this retrospective study, 86 medical laboratory reports of Nucleic acid test report for the seven upper respiratory tract pathogens were compiled. There were 285 errors from four common error categories intentionally and randomly introduced into reports and generated 86 incorrected reports. GPT models were tasked with detecting these errors, using three senior medical laboratory scientists (SMLS) and three medical laboratory interns (MLI) as control groups. Additionally, GPT models were tasked with generating accurate and reliable treatment recommendations following positive test outcomes based on 86 corrected reports. χ2 tests, Kruskal-Wallis tests, and Wilcoxon tests were used for statistical analysis where appropriate.

RESULTS

In comparison with SMLS or MLI, GPT models accurately detected three error types, and the average detection rates of the three GPT models were 88.9 %(omission), 91.6 % (time sequence), and 91.7 % (the same individual acted both as the inspector and the reviewer). However, the average detection rate for errors in the result input format by the three GPT models was only 51.9 %, indicating a relatively poor performance in this aspect. GPT models exhibited substantial to almost perfect agreement with SMLS in detecting total errors (kappa [min, max]: 0.778, 0.837). However, the agreement between GPT models and MLI was moderately lower (kappa [min, max]: 0.632, 0.696). When it comes to reading all 86 reports, GPT models showed obviously reduced reading time compared with SMLS or MLI (all p<0.001). Notably, our study also found the GPT-o1 mini model had better consistency of error identification than the GPT-o1 model, which was better than that of the GPT-4o model. The pairwise comparisons of the same GPT model's outputs across three repeated runs showed almost perfect agreement (kappa [min, max]: 0.912, 0.996). GPT-o1 mini showed obviously reduced reading time compared with GPT-4o or GPT-o1(all p<0.001). Additionally, GPT-o1 significantly outperformed GPT-4o or o1 mini in providing accurate and reliable treatment recommendations (all p<0.0001).

CONCLUSIONS

The detection capability of some of medical laboratory report errors and the accuracy and reliability of treatment recommendations of GPT models was competent, especially, potentially reducing work hours and enhancing clinical decision-making.

摘要

目的

准确的医学实验室报告对于提供高质量医疗保健至关重要。最近,先进的人工智能模型,如ChatGPT系列中的模型,在这一领域显示出了巨大的潜力。本研究评估了特定GPT模型——即4o、o1和o1 mini——在识别医学实验室报告中的错误以及提供治疗建议方面的性能。

方法

在这项回顾性研究中,收集了86份七种上呼吸道病原体的核酸检测报告。故意且随机地将来自四个常见错误类别的285个错误引入报告中,生成了86份有错误的报告。让GPT模型负责检测这些错误,将三名高级医学实验室科学家(SMLS)和三名医学实验室实习生(MLI)作为对照组。此外,让GPT模型根据86份纠正后的报告,在检测结果呈阳性后生成准确可靠的治疗建议。在适当的情况下,使用χ2检验、Kruskal-Wallis检验和Wilcoxon检验进行统计分析。

结果

与SMLS或MLI相比,GPT模型准确地检测出了三种错误类型,三种GPT模型的平均检测率分别为88.9%(遗漏)、91.6%(时间顺序)和91.7%(同一个人既担任检查员又担任审核员)。然而,三种GPT模型对结果输入格式错误的平均检测率仅为51.9%,表明在这方面表现相对较差。在检测总错误方面,GPT模型与SMLS表现出高度到几乎完美的一致性(kappa[最小值,最大值]:0.778,0.837)。然而,GPT模型与MLI之间的一致性略低(kappa[最小值,最大值]:0.632,0.696)。在阅读所有86份报告时,GPT模型的阅读时间明显少于SMLS或MLI(所有p<0.001)。值得注意的是,我们的研究还发现,GPT-o1 mini模型在错误识别方面的一致性优于GPT-o1模型,而GPT-o1模型又优于GPT-4o模型。同一GPT模型在三次重复运行中的输出进行两两比较,显示出几乎完美的一致性(kappa[最小值,最大值]:0.912,0.996)。与GPT-4o或GPT-o1相比,GPT-o1 mini的阅读时间明显减少(所有p<0.001)。此外,在提供准确可靠的治疗建议方面,GPT-o1明显优于GPT-4o或o1 mini(所有p<0.0001)。

结论

GPT模型对一些医学实验室报告错误的检测能力以及治疗建议的准确性和可靠性是合格的,特别是可能减少工作时间并增强临床决策。

相似文献

1
Evaluation of error detection and treatment recommendations in nucleic acid test reports using ChatGPT models.使用ChatGPT模型评估核酸检测报告中的错误检测和治疗建议
Clin Chem Lab Med. 2025 Apr 21. doi: 10.1515/cclm-2025-0089.
2
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
3
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
4
Sexual Harassment and Prevention Training性骚扰与预防培训
5
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
6
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
7
Potential of ChatGPT in youth mental health emergency triage: Comparative analysis with clinicians.ChatGPT在青少年心理健康紧急分诊中的潜力:与临床医生的比较分析
PCN Rep. 2025 Jul 15;4(3):e70159. doi: 10.1002/pcn5.70159. eCollection 2025 Sep.
8
The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.样本采集部位和采集程序对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染鉴定的影响。
Cochrane Database Syst Rev. 2024 Dec 16;12(12):CD014780. doi: 10.1002/14651858.CD014780.
9
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
10
Thyroid Eye Disease and Artificial Intelligence: A Comparative Study of ChatGPT-3.5, ChatGPT-4o, and Gemini in Patient Information Delivery.甲状腺眼病与人工智能:ChatGPT-3.5、ChatGPT-4o和Gemini在患者信息传递方面的比较研究
Ophthalmic Plast Reconstr Surg. 2024 Dec 24. doi: 10.1097/IOP.0000000000002882.