Sawamura Shogo, Kohiyama Kengo, Takenaka Takahiro, Sera Tatsuya, Inoue Tadatoshi, Nagai Takashi
Department of Rehabilitation, Heisei College of Health Sciences, Gifu, JPN.
Cureus. 2025 Feb 17;17(2):e79183. doi: 10.7759/cureus.79183. eCollection 2025 Feb.
Introduction This study explored the potential of using large language models (LLMs) to generate multiple-choice questions (MCQs) for the Japanese National Licensure Examination for Physical Therapists. Specifically, it evaluated the performance of a customized ChatGPT (OpenAI, San Francisco, CA, USA) model named "Physio Exam Generative Pre-trained Transformers (GPT)" in generating high-quality MCQs in non-English contexts. Materials and methods Based on the data extracted from the 57th and 58th Japanese National Licensure Examination for Physical Therapists, 340 MCQs, including correct answers, explanations, and associated topics, were incorporated into the knowledge base of the GPTs. The prompts and outputs were conducted in Japanese. The generated MCQs covered major topics in general (anatomy, physiology, and kinesiology) and practical questions (musculoskeletal disorders, central nervous system disorders, and internal organ disorders). The quality of the MCQs and their explanations were evaluated by two independent reviewers using a 10-point Likert scale across five criteria: clarity, relevance to clinical practice, suitability of difficulty, quality of distractors, and adequacy of rationale. Results The generated MCQs achieved 100% accuracy for both general and practical questions. The average scores across the evaluation criteria ranged from 7.0 to 9.8 for general questions and 6.7 to 9.8 for practical questions. Although some areas exhibited lower scores, the overall results were favorable. Conclusions This study demonstrates the potential of LLMs to efficiently generate high-quality MCQs, even in non-English environments such as Japanese. These findings suggest that LLMs can adapt to diverse linguistic settings, reduce educators' workload, and improve the quality of educational resources. These results lay a foundation for expanding the application of LLMs to educational settings across non-English-speaking regions.
引言 本研究探讨了使用大语言模型(LLMs)为日本物理治疗师国家执照考试生成多项选择题(MCQs)的潜力。具体而言,它评估了一个名为“物理治疗考试生成预训练变换器(GPT)”的定制ChatGPT(美国加利福尼亚州旧金山的OpenAI)模型在非英语环境中生成高质量MCQs的性能。
材料和方法 基于从第57届和第58届日本物理治疗师国家执照考试中提取的数据,340道MCQs,包括正确答案、解释和相关主题,被纳入GPTs的知识库。提示和输出均用日语进行。生成的MCQs涵盖了一般的主要主题(解剖学、生理学和运动学)以及实际问题(肌肉骨骼疾病、中枢神经系统疾病和内脏疾病)。两位独立评审员使用10分制李克特量表,根据五个标准对MCQs及其解释的质量进行评估:清晰度、与临床实践的相关性、难度适宜性、干扰项质量和理由充分性。
结果 生成的MCQs在一般问题和实际问题上的准确率均达到100%。一般问题在评估标准上的平均得分在7.0至9.8之间,实际问题在6.7至9.8之间。尽管有些领域得分较低,但总体结果是令人满意的。
结论 本研究表明,即使在日语等非英语环境中,大语言模型也有潜力高效生成高质量的MCQs。这些发现表明,大语言模型可以适应不同的语言环境,减轻教育工作者的工作量,并提高教育资源的质量。这些结果为将大语言模型扩展应用于非英语地区的教育环境奠定了基础。