Suppr超能文献

生成式预训练变换器(GPT)-4和Gemini Advanced在日本一级放射防护主管考试中的表现。

Performance of Generative Pre-trained Transformer (GPT)-4 and Gemini Advanced on the First-Class Radiation Protection Supervisor Examination in Japan.

作者信息

Goto Hiroki, Shiraishi Yoshioki, Okada Seiji

机构信息

Radioisotope and Tumor Pathobiology, Institute of Resource Development and Analysis, Kumamoto University, Kumamoto, JPN.

Radioisotope Center, Institute of Resource Development and Analysis, Kumamoto University, Kumamoto, JPN.

出版信息

Cureus. 2024 Oct 1;16(10):e70614. doi: 10.7759/cureus.70614. eCollection 2024 Oct.

Abstract

​Purpose The purpose of this study was to evaluate the capabilities of large language models (LLMs) in understanding radiation safety and protection. We assessed the performance of generative pe-trained transformer (GPT)-4 (OpenAI, USA) and Gemini Advanced (Google DeepMind, London) using questions from the First-Class Radiation Protection Supervisor Examination in Japan. Methods The study involved GPT-4 and Gemini Advanced answering questions from the 68th First-Class Radiation Protection Supervisor Examination in Japan. The number of correct and incorrect answers based on the subject, the presence or absence of calculation, the passage length, and the format (textual or graphical questions) were analyzed in this study. Comparisons of the results between GPT-4 and Gemini Advanced were performed. Results The overall accuracy rates of GPT-4 and Gemini Advanced were 71.0% and 65.3%, respectively. A significant difference was observed in the subject (P < 0.0001 in GPT-4 and P = 0.0127 in Gemini Advanced). The accuracy rate of laws and regulations was lower than in the other subjects. There was no significant difference in the presence or absence of calculation or the passage length. The performance of both LLMs was significantly better in textual questions than in graphical questions (P = 0.0003 in GPT-4 and P < 0.0001 in Gemini Advanced). The performance of the two LLMs did not differ significantly based on the subject, the presence or absence of calculation, the passage length, or the format. Conclusions GPT-4 and Gemini Advanced demonstrated sufficient understanding of physics, chemistry, biology, and practical operations to meet the passing standard for the average score. However, in laws and regulations, their performance was insufficient, possibly due to frequent revisions and the complexity of detailed regulations, and further machine learning is required.

摘要

目的 本研究的目的是评估大语言模型(LLMs)在理解辐射安全与防护方面的能力。我们使用日本一级辐射防护主管考试的问题评估了生成式预训练变换器(GPT)-4(美国OpenAI)和Gemini Advanced(英国伦敦谷歌DeepMind)的性能。方法 本研究让GPT-4和Gemini Advanced回答日本第68届一级辐射防护主管考试的问题。分析了基于主题的正确和错误答案数量、是否有计算、段落长度以及格式(文本或图形问题)。对GPT-4和Gemini Advanced的结果进行了比较。结果 GPT-4和Gemini Advanced的总体准确率分别为71.0%和65.3%。在主题方面观察到显著差异(GPT-4中P<0.0001,Gemini Advanced中P = 0.0127)。法律法规部分的准确率低于其他主题。在是否有计算或段落长度方面没有显著差异。两个大语言模型在文本问题上的表现均显著优于图形问题(GPT-4中P = 0.0003,Gemini Advanced中P<0.0001)。基于主题、是否有计算、段落长度或格式,两个大语言模型的表现没有显著差异。结论 GPT-4和Gemini Advanced在物理、化学、生物学和实际操作方面表现出了足够的理解能力,达到了平均分数的及格标准。然而,在法律法规方面,它们的表现不足,可能是由于法规频繁修订和详细规定的复杂性,需要进一步的机器学习。

相似文献

本文引用的文献

9
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验