Odabashian Roupen, Bastin Donald, Jones Georden, Manzoor Maria, Tangestaniapour Sina, Assad Malke, Lakhani Sunita, Odabashian Maritsa, McGee Sharon
Department of Oncology, Barbara Ann Karmanos Cancer Institute, Wayne State University, Detroit, MI, United States.
Department of Medicine, Division of Internal Medicine, The Ottawa Hospital and the University of Ottawa, Ottawa, ON, Canada.
JMIR AI. 2024 Jan 12;3:e50442. doi: 10.2196/50442.
ChatGPT (Open AI) is a state-of-the-art large language model that uses artificial intelligence (AI) to address questions across diverse topics. The American Society of Clinical Oncology Self-Evaluation Program (ASCO-SEP) created a comprehensive educational program to help physicians keep up to date with the many rapid advances in the field. The question bank consists of multiple choice questions addressing the many facets of cancer care, including diagnosis, treatment, and supportive care. As ChatGPT applications rapidly expand, it becomes vital to ascertain if the knowledge of ChatGPT-3.5 matches the established standards that oncologists are recommended to follow.
This study aims to evaluate whether ChatGPT-3.5's knowledge aligns with the established benchmarks that oncologists are expected to adhere to. This will furnish us with a deeper understanding of the potential applications of this tool as a support for clinical decision-making.
We conducted a systematic assessment of the performance of ChatGPT-3.5 on the ASCO-SEP, the leading educational and assessment tool for medical oncologists in training and practice. Over 1000 multiple choice questions covering the spectrum of cancer care were extracted. Questions were categorized by cancer type or discipline, with subcategorization as treatment, diagnosis, or other. Answers were scored as correct if ChatGPT-3.5 selected the answer as defined by ASCO-SEP.
Overall, ChatGPT-3.5 achieved a score of 56.1% (583/1040) for the correct answers provided. The program demonstrated varying levels of accuracy across cancer types or disciplines. The highest accuracy was observed in questions related to developmental therapeutics (8/10; 80% correct), while the lowest accuracy was observed in questions related to gastrointestinal cancer (102/209; 48.8% correct). There was no significant difference in the program's performance across the predefined subcategories of diagnosis, treatment, and other (P=.16, which is greater than .05).
This study evaluated ChatGPT-3.5's oncology knowledge using the ASCO-SEP, aiming to address uncertainties regarding AI tools like ChatGPT in clinical decision-making. Our findings suggest that while ChatGPT-3.5 offers a hopeful outlook for AI in oncology, its present performance in ASCO-SEP tests necessitates further refinement to reach the requisite competency levels. Future assessments could explore ChatGPT's clinical decision support capabilities with real-world clinical scenarios, its ease of integration into medical workflows, and its potential to foster interdisciplinary collaboration and patient engagement in health care settings.
ChatGPT(OpenAI)是一种先进的大型语言模型,利用人工智能(AI)来回答各种主题的问题。美国临床肿瘤学会自我评估项目(ASCO-SEP)创建了一个全面的教育项目,以帮助医生跟上该领域众多快速发展的步伐。题库由涉及癌症护理多个方面的多项选择题组成,包括诊断、治疗和支持性护理。随着ChatGPT应用的迅速扩展,确定ChatGPT-3.5的知识是否符合建议肿瘤学家遵循的既定标准变得至关重要。
本研究旨在评估ChatGPT-3.5的知识是否与肿瘤学家应遵循的既定基准一致。这将使我们更深入地了解该工具作为临床决策支持的潜在应用。
我们对ChatGPT-3.5在ASCO-SEP上的表现进行了系统评估,ASCO-SEP是肿瘤内科医生培训和实践中的主要教育和评估工具。提取了1000多个涵盖癌症护理各个方面的多项选择题。问题按癌症类型或学科分类,并细分为治疗、诊断或其他类别。如果ChatGPT-3.5选择了ASCO-SEP定义的答案,则该答案被评为正确。
总体而言,ChatGPT-3.5提供的正确答案得分率为56.1%(583/1040)。该程序在不同癌症类型或学科中的准确率各不相同。在与开发性治疗相关的问题中观察到最高准确率(8/10;80%正确),而在与胃肠道癌相关的问题中观察到最低准确率(102/209;48.8%正确)。该程序在预定义的诊断、治疗和其他子类别中的表现没有显著差异(P = 0.16,大于0.05)。
本研究使用ASCO-SEP评估了ChatGPT-3.5的肿瘤学知识,旨在解决临床决策中关于ChatGPT等人工智能工具的不确定性。我们的研究结果表明,虽然ChatGPT-3.5为肿瘤学中的人工智能提供了一个充满希望的前景,但其目前在ASCO-SEP测试中的表现需要进一步改进,以达到所需的能力水平。未来的评估可以探索ChatGPT在真实临床场景中的临床决策支持能力、其融入医疗工作流程的难易程度,以及它在医疗环境中促进跨学科合作和患者参与的潜力。