Suppr超能文献

比较大型语言模型和人类程序员在生成编程代码方面的表现。

Comparing Large Language Models and Human Programmers for Generating Programming Code.

作者信息

Hou Wenpin, Ji Zhicheng

机构信息

Department of Biostatistics, Mailman School of Public Health, Columbia University, New York City, NY, 10032, USA.

Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, 07024, USA.

出版信息

Adv Sci (Weinh). 2025 Feb;12(8):e2412279. doi: 10.1002/advs.202412279. Epub 2024 Dec 30.

Abstract

The performance of seven large language models (LLMs) in generating programming code using various prompt strategies, programming languages, and task difficulties is systematically evaluated. GPT-4 substantially outperforms other LLMs, including Gemini Ultra and Claude 2. The coding performance of GPT-4 varies considerably with different prompt strategies. In most LeetCode and GeeksforGeeks coding contests evaluated in this study, GPT-4, employing the optimal prompt strategy, outperforms 85 percent of human participants in a competitive environment, many of whom are students and professionals with moderate programming experience. GPT-4 demonstrates strong capabilities in translating code between different programming languages and in learning from past errors. The computational efficiency of the code generated by GPT-4 is comparable to that of human programmers. GPT-4 is also capable of handling broader programming tasks, including front-end design and database operations. These results suggest that GPT-4 has the potential to serve as a reliable assistant in programming code generation and software development. A programming assistant is designed based on an optimal prompt strategy to facilitate the practical use of LLMs for programming.

摘要

系统评估了七种大型语言模型(LLM)在使用各种提示策略、编程语言和任务难度生成编程代码方面的性能。GPT-4的表现大幅优于其他LLM,包括Gemini Ultra和Claude 2。GPT-4的编码性能因不同的提示策略而有很大差异。在本研究评估的大多数LeetCode和GeeksforGeeks编码竞赛中,采用最优提示策略的GPT-4在竞争环境中优于85%的人类参与者,其中许多是具有中等编程经验的学生和专业人员。GPT-4在不同编程语言之间的代码翻译以及从过去的错误中学习方面表现出强大的能力。GPT-4生成的代码的计算效率与人类程序员相当。GPT-4还能够处理更广泛的编程任务,包括前端设计和数据库操作。这些结果表明,GPT-4有潜力在编程代码生成和软件开发中作为可靠的助手。基于最优提示策略设计了一个编程助手,以促进LLM在编程中的实际应用。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验