Chat-GPT 与脑肿瘤：人工智能/机器学习提供神经肿瘤学等案例诊断和治疗方案的能力评估。

Chat-GPT on brain tumors: An examination of Artificial Intelligence/Machine Learning's ability to provide diagnoses and treatment plans for example neuro-oncology cases.

机构信息

Warren Alpert Medical School, Brown University, Providence, RI, USA.

Department of Neurosurgery, Miller School of Medicine, University of Miami, Miami, FL, USA.

出版信息

Clin Neurol Neurosurg. 2024 Apr;239:108238. doi: 10.1016/j.clineuro.2024.108238. Epub 2024 Mar 9.

DOI:10.1016/j.clineuro.2024.108238

PMID:38507989

Abstract

OBJECTIVE

Assess the capabilities of ChatGPT-3.5 and 4 to provide accurate diagnoses, treatment options, and treatment plans for brain tumors in example neuro-oncology cases.

METHODS

ChatGPT-3.5 and 4 were provided with twenty example neuro-oncology cases of brain tumors, all selected from medical textbooks. The artificial intelligence programs were asked to give a diagnosis, treatment option, and treatment plan for each of these twenty example cases. Team members first determined in which cases ChatGPT-3.5 and 4 provided the correct diagnosis or treatment plan. Twenty neurosurgeons from the researchers' institution then independently rated the diagnoses, treatment options, and treatment plans provided by both artificial intelligence programs for each of the twenty example cases, on a scale of one to ten, with ten being the highest score. To determine whether the difference between the scores of ChatGPT-3.5 and 4 was statistically significant, a paired t-test was conducted for the average scores given to the programs for each example case.

RESULTS

In the initial analysis of correct responses, ChatGPT-4 had an accuracy of 85% for its diagnoses of example brain tumors and an accuracy of 75% for its provided treatment plans, while ChatGPT-3.5 only had an accuracy of 65% and 10%, respectively. The average scores given by the twenty independent neurosurgeons to ChatGPT-4 for its accuracy of diagnosis, provided treatment options, and provided treatment plan were 8.3, 8.4, and 8.5 out of 10, respectively, while ChatGPT-3.5's average scores for these categories of assessment were 5.9, 5.7, and 5.7. These differences in average score are statistically significant on a paired t-test, with a p-value of less than 0.001 for each difference.

CONCLUSIONS

ChatGPT-4 demonstrates great promise as a diagnostic tool for brain tumors in neuro-oncology, as attested to by the program's performance in this study and its assessment by surveyed neurosurgeon reviewers.

摘要

目的

评估 ChatGPT-3.5 和 4 在神经肿瘤学示例病例中提供脑肿瘤准确诊断、治疗方案和治疗计划的能力。

方法

向 ChatGPT-3.5 和 4 提供二十个脑肿瘤神经肿瘤学示例病例，均选自医学教科书。要求人工智能程序为这二十个示例病例中的每一个提供诊断、治疗选择和治疗计划。研究人员所在机构的二十名神经外科医生首先确定 ChatGPT-3.5 和 4 在哪种情况下提供了正确的诊断或治疗计划。然后，这二十名神经外科医生对这两个人工智能程序为这二十个示例病例中的每一个提供的诊断、治疗选择和治疗计划进行独立评分，分值为 1 到 10，10 分为最高分。为了确定 ChatGPT-3.5 和 4 的评分差异是否具有统计学意义，对每个示例病例中程序的平均评分进行了配对 t 检验。

结果

在初始正确响应分析中，ChatGPT-4 对示例脑瘤的诊断准确率为 85%，治疗计划准确率为 75%，而 ChatGPT-3.5 的准确率分别为 65%和 10%。二十名独立神经外科医生对 ChatGPT-4 的诊断准确率、提供的治疗方案和提供的治疗计划的平均评分分别为 10 分中的 8.3 分、8.4 分和 8.5 分，而 ChatGPT-3.5 的平均评分分别为 5.9 分、5.7 分和 5.7 分。配对 t 检验显示这些平均评分差异具有统计学意义，每个差异的 p 值均小于 0.001。