Goto Hiroki, Shiraishi Yoshioki, Okada Seiji
Radioisotope and Tumor Pathobiology, Institute of Resource Development and Analysis, Kumamoto University, Kumamoto, JPN.
Radioisotope Center, Institute of Resource Development and Analysis, Kumamoto University, Kumamoto, JPN.
Cureus. 2024 Nov 22;16(11):e74262. doi: 10.7759/cureus.74262. eCollection 2024 Nov.
Purpose The purpose of this study was to assess the ability of large language models (LLMs) to comprehend the safety management, protection methods, and proper handling of X-rays according to laws and regulations. We evaluated the performance of GPT-4o (OpenAI, San Francisco, CA, USA) and o1-preview (OpenAI) using questions from the 'Operations Chief of Radiography With X-rays' certification examination in Japan. Methods This study engaged GPT-4o and o1-preview in responding to questions from this Japanese certification examination for 'Operations Chief of Radiography With X-rays'. A total of four sets of exams published from April 2023 to October 2024 were used. The accuracy of each model was evaluated across the subjects, including knowledge about the control of X-rays, relevant laws and regulations, knowledge about the measurement of X-rays, and knowledge about the effects of X-rays on organisms. The results were compared between the two models, excluding graphical questions due to o1-preview's inability to interpret images. Results The overall accuracy rates of GPT-4o and o1-preview ranged from 57.5% to 70.0% and from 71.1% to 86.5%, respectively. The GPT-4o achieved passing accuracy rates in the subjects except for relevant laws and regulations. In contrast, o1-preview met the passing criteria across all four sets, despite graphical questions being excluded from scoring. The accuracy of all questions and relevant laws and regulations in o1-preview were significantly higher than those in GPT-4o (p = 0.03 for all questions and p = 0.03 for relevant laws and regulations, respectively). No significant differences in accuracy were found across the other subjects. Conclusions In the Japanese 'Operations Chief of Radiography With X-rays' certification examination, GPT-4o demonstrated a competent performance in the subjects except for relevant laws and regulations, while o1-preview showed a commendable performance across all subjects. When graphical questions were excluded from scoring, the performance of o1-preview surpassed that of GPT-4o in all questions and relevant laws and regulations.
目的 本研究旨在评估大语言模型(LLMs)根据法律法规理解X射线安全管理、防护方法及正确操作的能力。我们使用日本“X射线摄影操作主管”认证考试的问题,评估了GPT-4o(美国加利福尼亚州旧金山OpenAI公司)和o1-preview(OpenAI)的性能。方法 本研究让GPT-4o和o1-preview回答日本“X射线摄影操作主管”认证考试的问题。共使用了2023年4月至2024年10月发布的四套考试题目。对每个模型在各科目上的准确性进行评估,包括X射线控制知识、相关法律法规、X射线测量知识以及X射线对生物体影响的知识。由于o1-preview无法解释图像,将图形问题排除后,对两个模型的结果进行比较。结果 GPT-4o和o1-preview的总体准确率分别为57.5%至70.0%和71.1%至86.5%。GPT-4o在除相关法律法规外的科目中达到了及格准确率。相比之下,尽管图形问题不计入得分,o1-preview在所有四套题目中均符合及格标准。o1-preview中所有问题和相关法律法规的准确率均显著高于GPT-4o(所有问题p = 0.03,相关法律法规p = 0.03)。在其他科目中未发现准确率有显著差异。结论 在日本“X射线摄影操作主管”认证考试中,GPT-4o在除相关法律法规外的科目中表现良好,而o1-preview在所有科目中均表现出色。当图形问题不计入得分时,o1-preview在所有问题和相关法律法规方面的表现超过了GPT-4o。