Department of Clinical Neurosciences, Division of Neurosurgery, Geneva University Hospitals, Geneva, Switzerland
Department of Clinical Neurosciences, Division of Neurology, Geneva University Hospitals, Geneva, Switzerland.
BMJ Health Care Inform. 2023 Jun;30(1). doi: 10.1136/bmjhci-2023-100775.
To evaluate ChatGPT's performance in brain glioma adjuvant therapy decision-making.
We randomly selected 10 patients with brain gliomas discussed at our institution's central nervous system tumour board (CNS TB). Patients' clinical status, surgical outcome, textual imaging information and immuno-pathology results were provided to ChatGPT V.3.5 and seven CNS tumour experts. The chatbot was asked to give the adjuvant treatment choice, and the regimen while considering the patient's functional status. The experts rated the artificial intelligence-based recommendations from 0 (complete disagreement) to 10 (complete agreement). An intraclass correlation coefficient agreement (ICC) was used to measure the inter-rater agreement.
Eight patients (80%) met the criteria for glioblastoma and two (20%) were low-grade gliomas. The experts rated the quality of ChatGPT recommendations as poor for diagnosis (median 3, IQR 1-7.8, ICC 0.9, 95% CI 0.7 to 1.0), good for treatment recommendation (7, IQR 6-8, ICC 0.8, 95% CI 0.4 to 0.9), good for therapy regimen (7, IQR 4-8, ICC 0.8, 95% CI 0.5 to 0.9), moderate for functional status consideration (6, IQR 1-7, ICC 0.7, 95% CI 0.3 to 0.9) and moderate for overall agreement with the recommendations (5, IQR 3-7, ICC 0.7, 95% CI 0.3 to 0.9). No differences were observed between the glioblastomas and low-grade glioma ratings.
ChatGPT performed poorly in classifying glioma types but was good for adjuvant treatment recommendations as evaluated by CNS TB experts. Even though the ChatGPT lacks the precision to replace expert opinion, it may serve as a promising supplemental tool within a human-in-the-loop approach.
评估 ChatGPT 在脑胶质瘤辅助治疗决策中的表现。
我们随机选择了我所在机构中枢神经系统肿瘤委员会(CNS TB)讨论的 10 例脑胶质瘤患者。向 ChatGPT V.3.5 和 7 名中枢神经系统肿瘤专家提供了患者的临床状况、手术结果、文本影像学信息和免疫病理学结果。要求聊天机器人根据患者的功能状况给出辅助治疗选择和方案。专家对人工智能推荐的建议进行 0(完全不同意)至 10(完全同意)的评分。采用组内相关系数一致性(ICC)来衡量评分者间的一致性。
8 名患者(80%)符合胶质母细胞瘤标准,2 名(20%)为低级别胶质瘤。专家对 ChatGPT 推荐的诊断质量评价较差(中位数 3,IQR 1-7.8,ICC 0.9,95%CI 0.7 至 1.0),对治疗建议的评价较好(7,IQR 6-8,ICC 0.8,95%CI 0.4 至 0.9),对治疗方案的评价较好(7,IQR 4-8,ICC 0.8,95%CI 0.5 至 0.9),对功能状况考虑的评价为中等(6,IQR 1-7,ICC 0.7,95%CI 0.3 至 0.9),对推荐的总体一致性评价为中等(5,IQR 3-7,ICC 0.7,95%CI 0.3 至 0.9)。胶质母细胞瘤和低级别胶质瘤的评分无差异。
ChatGPT 在分类胶质瘤类型方面表现不佳,但在 CNS TB 专家评估中,对辅助治疗建议表现良好。尽管 ChatGPT 缺乏取代专家意见的精度,但它可以作为一种有前途的、在人机交互方式中的辅助工具。