Choi Haneul, Lee Jehyun, Kim Jonghun
Energy ICT Research Department, Korea Institute of Energy Research, Daejeon, 34129, South Korea.
Energy AI & Computational Science Laboratory, Korea Institute of Energy Research, Daejeon, 34129, South Korea.
Sci Rep. 2025 Aug 19;15(1):30436. doi: 10.1038/s41598-025-16118-x.
This study evaluates the applicability of large language models (LLMs) in mechanical equipment maintenance in buildings by assessing GPT-4o's performance on two national certification exams in South Korea: Engineer Energy Management (EEM) and Engineer Air-Conditioning Refrigerating Machinery (EACRM). GPT-4o achieved average scores of 80.6 and 81.25 on the EEM and EACRM exams, respectively, passing all five attempts. The model performed well on both non-calculation and calculation problems and demonstrated high consistency, with an average response consistency of 97%. Despite these strengths, three key limitations were identified: weak advanced reasoning, difficulty in solving legal questions, and poor interpretation of scientific figures. Experimental results indicate that advanced reasoning can be improved using reasoning-optimized models, while legal question accuracy can be significantly enhanced with retrieval-augmented generation (RAG). However, figure interpretation remains dependent on advancements in visual recognition capabilities. These findings suggest that GPT-4o possesses foundational knowledge applicable to mechanical equipment maintenance in buildings but also highlight the need to address certain limitations for practical implementation. This study provides a foundation for future research on integrating LLMs into industrial applications, such as maintenance management software, to enhance maintenance efficiency and address workforce shortages.
本研究通过评估GPT-4o在韩国两项国家认证考试:能源管理工程师(EEM)和空调制冷机械工程师(EACRM)中的表现,来评估大语言模型(LLMs)在建筑机械设备维护中的适用性。GPT-4o在EEM和EACRM考试中分别取得了80.6分和81.25分的平均成绩,五次考试全部通过。该模型在非计算和计算问题上均表现出色,并显示出高度的一致性,平均回答一致性为97%。尽管有这些优点,但也发现了三个关键限制:高级推理能力薄弱、解决法律问题困难以及对科学图表的解读能力差。实验结果表明,使用推理优化模型可以提高高级推理能力,而通过检索增强生成(RAG)可以显著提高法律问题的准确性。然而,图表解读仍依赖于视觉识别能力的进步。这些发现表明,GPT-4o拥有适用于建筑机械设备维护的基础知识,但也凸显了在实际应用中解决某些限制的必要性。本研究为未来将大语言模型集成到工业应用(如维护管理软件)以提高维护效率和解决劳动力短缺问题的研究奠定了基础。