• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GPT-4o在韩国建筑机械设备维护国家考试中的性能评估。

Performance evaluation of GPT-4o on South Korean national exams for building mechanical equipment maintenance.

作者信息

Choi Haneul, Lee Jehyun, Kim Jonghun

机构信息

Energy ICT Research Department, Korea Institute of Energy Research, Daejeon, 34129, South Korea.

Energy AI & Computational Science Laboratory, Korea Institute of Energy Research, Daejeon, 34129, South Korea.

出版信息

Sci Rep. 2025 Aug 19;15(1):30436. doi: 10.1038/s41598-025-16118-x.

DOI:10.1038/s41598-025-16118-x
PMID:40830641
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12365024/
Abstract

This study evaluates the applicability of large language models (LLMs) in mechanical equipment maintenance in buildings by assessing GPT-4o's performance on two national certification exams in South Korea: Engineer Energy Management (EEM) and Engineer Air-Conditioning Refrigerating Machinery (EACRM). GPT-4o achieved average scores of 80.6 and 81.25 on the EEM and EACRM exams, respectively, passing all five attempts. The model performed well on both non-calculation and calculation problems and demonstrated high consistency, with an average response consistency of 97%. Despite these strengths, three key limitations were identified: weak advanced reasoning, difficulty in solving legal questions, and poor interpretation of scientific figures. Experimental results indicate that advanced reasoning can be improved using reasoning-optimized models, while legal question accuracy can be significantly enhanced with retrieval-augmented generation (RAG). However, figure interpretation remains dependent on advancements in visual recognition capabilities. These findings suggest that GPT-4o possesses foundational knowledge applicable to mechanical equipment maintenance in buildings but also highlight the need to address certain limitations for practical implementation. This study provides a foundation for future research on integrating LLMs into industrial applications, such as maintenance management software, to enhance maintenance efficiency and address workforce shortages.

摘要

本研究通过评估GPT-4o在韩国两项国家认证考试:能源管理工程师(EEM)和空调制冷机械工程师(EACRM)中的表现,来评估大语言模型(LLMs)在建筑机械设备维护中的适用性。GPT-4o在EEM和EACRM考试中分别取得了80.6分和81.25分的平均成绩,五次考试全部通过。该模型在非计算和计算问题上均表现出色,并显示出高度的一致性,平均回答一致性为97%。尽管有这些优点,但也发现了三个关键限制:高级推理能力薄弱、解决法律问题困难以及对科学图表的解读能力差。实验结果表明,使用推理优化模型可以提高高级推理能力,而通过检索增强生成(RAG)可以显著提高法律问题的准确性。然而,图表解读仍依赖于视觉识别能力的进步。这些发现表明,GPT-4o拥有适用于建筑机械设备维护的基础知识,但也凸显了在实际应用中解决某些限制的必要性。本研究为未来将大语言模型集成到工业应用(如维护管理软件)以提高维护效率和解决劳动力短缺问题的研究奠定了基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/1623c1da4982/41598_2025_16118_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/91dda0718697/41598_2025_16118_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/9f98d2f26805/41598_2025_16118_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/5e7ab03997e8/41598_2025_16118_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/676b0ff56dc7/41598_2025_16118_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/d532d032845b/41598_2025_16118_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/91f6907100bb/41598_2025_16118_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/a0e958c1d3e0/41598_2025_16118_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/2f7334390b8f/41598_2025_16118_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/e8b693c74ddf/41598_2025_16118_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/328289f351f0/41598_2025_16118_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/d67d6cc73494/41598_2025_16118_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/06f5bb67b886/41598_2025_16118_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/2296976c90cd/41598_2025_16118_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/d91f6824e7da/41598_2025_16118_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/d14a45aa923f/41598_2025_16118_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/1623c1da4982/41598_2025_16118_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/91dda0718697/41598_2025_16118_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/9f98d2f26805/41598_2025_16118_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/5e7ab03997e8/41598_2025_16118_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/676b0ff56dc7/41598_2025_16118_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/d532d032845b/41598_2025_16118_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/91f6907100bb/41598_2025_16118_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/a0e958c1d3e0/41598_2025_16118_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/2f7334390b8f/41598_2025_16118_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/e8b693c74ddf/41598_2025_16118_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/328289f351f0/41598_2025_16118_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/d67d6cc73494/41598_2025_16118_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/06f5bb67b886/41598_2025_16118_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/2296976c90cd/41598_2025_16118_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/d91f6824e7da/41598_2025_16118_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/d14a45aa923f/41598_2025_16118_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9cb1/12365024/1623c1da4982/41598_2025_16118_Fig16_HTML.jpg

相似文献

1
Performance evaluation of GPT-4o on South Korean national exams for building mechanical equipment maintenance.GPT-4o在韩国建筑机械设备维护国家考试中的性能评估。
Sci Rep. 2025 Aug 19;15(1):30436. doi: 10.1038/s41598-025-16118-x.
2
Exploring GPT-4o's multimodal reasoning capabilities with panoramic radiograph: the role of prompt engineering.利用全景X线片探索GPT-4o的多模态推理能力:提示工程的作用。
Clin Oral Investig. 2025 Aug 12;29(9):405. doi: 10.1007/s00784-025-06498-9.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination With Text-Only and Image-Accompanied Questions: Performance Evaluation Study.通过在日本外科医师资格考试中使用纯文本和图文并茂的问题评估GPT-4和GPT-4o来研究人工智能在外科培训中的作用:性能评估研究
JMIR Med Educ. 2025 Jul 30;11:e69313. doi: 10.2196/69313.
5
Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study.使用标准化多项选择题评估大型语言模型在精神病学中的准确性和可靠性:横断面研究
J Med Internet Res. 2025 May 20;27:e69910. doi: 10.2196/69910.
6
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能:比较研究
J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.
7
Large language models (LLMs) in radiology exams for medical students: Performance and consequences.面向医学生的放射学考试中的大语言模型:表现与影响。
Rofo. 2024 Nov 4. doi: 10.1055/a-2437-2067.
8
Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试(USMLE)问题上高精度背后的隐藏挑战:观察性研究。
J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.
9
Leveraging Retrieval-Augmented Large Language Models for Dietary Recommendations With Traditional Chinese Medicine's Medicine Food Homology: Algorithm Development and Validation.利用检索增强大语言模型结合中医药食同源进行饮食推荐:算法开发与验证
JMIR Med Inform. 2025 Aug 21;13:e75279. doi: 10.2196/75279.
10
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测:基于放射学报告的多中心方法学研究
J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.

本文引用的文献

1
Unleashing the potential of prompt engineering for large language models.释放大语言模型提示工程的潜力。
Patterns (N Y). 2025 May 8;6(6):101260. doi: 10.1016/j.patter.2025.101260. eCollection 2025 Jun 13.
2
Deriving insights from enhanced accuracy: Leveraging prompt engineering in custom GPT for assessing Chinese Nursing Licensing Exam.从更高的准确性中获取见解:在定制GPT中利用提示工程来评估中国护士执业资格考试。
Nurse Educ Pract. 2025 Mar;84:104284. doi: 10.1016/j.nepr.2025.104284. Epub 2025 Feb 4.
3
Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans?
人工智能在土耳其牙科专业考试中的表现:ChatGPT-4.0和Gemini Advanced能否取得与人类相当的成绩?
BMC Med Educ. 2025 Feb 10;25(1):214. doi: 10.1186/s12909-024-06389-9.
4
Evaluation of LLMs accuracy and consistency in the registered dietitian exam through prompt engineering and knowledge retrieval.通过提示工程和知识检索评估大语言模型在注册营养师考试中的准确性和一致性。
Sci Rep. 2025 Jan 9;15(1):1506. doi: 10.1038/s41598-024-85003-w.
5
Assessing AI efficacy in medical knowledge tests: A study using Taiwan's internal medicine exam questions from 2020 to 2023.评估人工智能在医学知识测试中的效能:一项使用2020年至2023年台湾内科医师考试试题的研究。
Digit Health. 2024 Oct 18;10:20552076241291404. doi: 10.1177/20552076241291404. eCollection 2024 Jan-Dec.
6
Scientific figures interpreted by ChatGPT: strengths in plot recognition and limits in color perception.由ChatGPT解读的科学图表:在图表识别方面的优势及在颜色感知方面的局限。
NPJ Precis Oncol. 2024 Apr 5;8(1):84. doi: 10.1038/s41698-024-00576-z.
7
The model student: GPT-4 performance on graduate biomedical science exams.模范学生:GPT-4 在研究生生物医学科学考试中的表现。
Sci Rep. 2024 Mar 7;14(1):5670. doi: 10.1038/s41598-024-55568-7.