GPT-4o在韩国建筑机械设备维护国家考试中的性能评估。

Performance evaluation of GPT-4o on South Korean national exams for building mechanical equipment maintenance.

作者信息

Choi Haneul, Lee Jehyun, Kim Jonghun

机构信息

Energy ICT Research Department, Korea Institute of Energy Research, Daejeon, 34129, South Korea.

Energy AI & Computational Science Laboratory, Korea Institute of Energy Research, Daejeon, 34129, South Korea.

出版信息

Sci Rep. 2025 Aug 19;15(1):30436. doi: 10.1038/s41598-025-16118-x.

DOI:10.1038/s41598-025-16118-x

PMID:40830641

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12365024/

Abstract

This study evaluates the applicability of large language models (LLMs) in mechanical equipment maintenance in buildings by assessing GPT-4o's performance on two national certification exams in South Korea: Engineer Energy Management (EEM) and Engineer Air-Conditioning Refrigerating Machinery (EACRM). GPT-4o achieved average scores of 80.6 and 81.25 on the EEM and EACRM exams, respectively, passing all five attempts. The model performed well on both non-calculation and calculation problems and demonstrated high consistency, with an average response consistency of 97%. Despite these strengths, three key limitations were identified: weak advanced reasoning, difficulty in solving legal questions, and poor interpretation of scientific figures. Experimental results indicate that advanced reasoning can be improved using reasoning-optimized models, while legal question accuracy can be significantly enhanced with retrieval-augmented generation (RAG). However, figure interpretation remains dependent on advancements in visual recognition capabilities. These findings suggest that GPT-4o possesses foundational knowledge applicable to mechanical equipment maintenance in buildings but also highlight the need to address certain limitations for practical implementation. This study provides a foundation for future research on integrating LLMs into industrial applications, such as maintenance management software, to enhance maintenance efficiency and address workforce shortages.

摘要

本研究通过评估GPT-4o在韩国两项国家认证考试：能源管理工程师（EEM）和空调制冷机械工程师（EACRM）中的表现，来评估大语言模型（LLMs）在建筑机械设备维护中的适用性。GPT-4o在EEM和EACRM考试中分别取得了80.6分和81.25分的平均成绩，五次考试全部通过。该模型在非计算和计算问题上均表现出色，并显示出高度的一致性，平均回答一致性为97%。尽管有这些优点，但也发现了三个关键限制：高级推理能力薄弱、解决法律问题困难以及对科学图表的解读能力差。实验结果表明，使用推理优化模型可以提高高级推理能力，而通过检索增强生成（RAG）可以显著提高法律问题的准确性。然而，图表解读仍依赖于视觉识别能力的进步。这些发现表明，GPT-4o拥有适用于建筑机械设备维护的基础知识，但也凸显了在实际应用中解决某些限制的必要性。本研究为未来将大语言模型集成到工业应用（如维护管理软件）以提高维护效率和解决劳动力短缺问题的研究奠定了基础。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

GPT-4o在韩国建筑机械设备维护国家考试中的性能评估。

Performance evaluation of GPT-4o on South Korean national exams for building mechanical equipment maintenance.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

GPT-4o在韩国建筑机械设备维护国家考试中的性能评估。

Performance evaluation of GPT-4o on South Korean national exams for building mechanical equipment maintenance.

作者信息

机构信息

出版信息

相似文献

本文引用的文献