Suppr超能文献

ChatGPT 对无创伤性胸痛患者的风险分层不一致。

ChatGPT provides inconsistent risk-stratification of patients with atraumatic chest pain.

机构信息

Department of Family Medicine, University of Washington School of Medicine, Seattle, Washington, United States of America.

Department of Medical Education and Clinical Sciences, Washington State University, Spokane, Washington, United States of America.

出版信息

PLoS One. 2024 Apr 16;19(4):e0301854. doi: 10.1371/journal.pone.0301854. eCollection 2024.

Abstract

BACKGROUND

ChatGPT-4 is a large language model with promising healthcare applications. However, its ability to analyze complex clinical data and provide consistent results is poorly known. Compared to validated tools, this study evaluated ChatGPT-4's risk stratification of simulated patients with acute nontraumatic chest pain.

METHODS

Three datasets of simulated case studies were created: one based on the TIMI score variables, another on HEART score variables, and a third comprising 44 randomized variables related to non-traumatic chest pain presentations. ChatGPT-4 independently scored each dataset five times. Its risk scores were compared to calculated TIMI and HEART scores. A model trained on 44 clinical variables was evaluated for consistency.

RESULTS

ChatGPT-4 showed a high correlation with TIMI and HEART scores (r = 0.898 and 0.928, respectively), but the distribution of individual risk assessments was broad. ChatGPT-4 gave a different risk 45-48% of the time for a fixed TIMI or HEART score. On the 44-variable model, a majority of the five ChatGPT-4 models agreed on a diagnosis category only 56% of the time, and risk scores were poorly correlated (r = 0.605).

CONCLUSION

While ChatGPT-4 correlates closely with established risk stratification tools regarding mean scores, its inconsistency when presented with identical patient data on separate occasions raises concerns about its reliability. The findings suggest that while large language models like ChatGPT-4 hold promise for healthcare applications, further refinement and customization are necessary, particularly in the clinical risk assessment of atraumatic chest pain patients.

摘要

背景

ChatGPT-4 是一款具有广阔医疗应用前景的大型语言模型。然而,其分析复杂临床数据并提供一致结果的能力尚不清楚。与经过验证的工具相比,本研究评估了 ChatGPT-4 对模拟急性非创伤性胸痛患者的风险分层能力。

方法

创建了三个模拟病例研究数据集:一个基于 TIMI 评分变量,另一个基于 HEART 评分变量,第三个包含 44 个与非创伤性胸痛表现相关的随机变量。ChatGPT-4 独立地对每个数据集进行了五次评分。其风险评分与计算得出的 TIMI 和 HEART 评分进行了比较。还评估了一个基于 44 个临床变量的模型的一致性。

结果

ChatGPT-4 与 TIMI 和 HEART 评分高度相关(r 值分别为 0.898 和 0.928),但个体风险评估的分布较广。对于固定的 TIMI 或 HEART 评分,ChatGPT-4 给出不同风险的概率为 45-48%。在 44 个变量模型上,五个 ChatGPT-4 模型中有多数情况下仅在 56%的时间内对诊断类别达成一致,且风险评分相关性较差(r 值为 0.605)。

结论

虽然 ChatGPT-4 在平均得分方面与既定的风险分层工具密切相关,但在不同时间对相同患者数据进行评估时的不一致性引起了对其可靠性的担忧。研究结果表明,尽管像 ChatGPT-4 这样的大型语言模型在医疗保健应用方面具有广阔的前景,但需要进一步的改进和定制,特别是在对非创伤性胸痛患者进行临床风险评估时。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d795/11020975/bcaa915b1ada/pone.0301854.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验