Suppr超能文献

评估具有各种任务和评估者的大语言模型的熟练程度。

Assessing the Proficiency of LLMs with Various Tasks and Evaluators.

机构信息

Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Republic of Korea.

Department of Biomedicine & Health Science, College of Medicine, The Catholic University of Korea, Republic of Korea.

出版信息

Stud Health Technol Inform. 2024 Aug 22;316:552-553. doi: 10.3233/SHTI240473.

Abstract

Previous studies have been limited to giving one or two tasks to Large Language Models (LLMs) and involved a small number of evaluators within a single domain to evaluate the LLM's answer. We assessed the proficiency of four LLMs by applying eight tasks and evaluating 32 results with 17 evaluators from diverse domains, demonstrating the significance of various tasks and evaluators on LLMs.

摘要

先前的研究仅限于向大型语言模型 (LLM) 提供一两个任务,并在单个领域内由少数评估者评估 LLM 的答案。我们通过应用八个任务并由来自不同领域的 17 名评估者评估 32 个结果来评估四个 LLM 的熟练程度,这表明各种任务和评估者对 LLM 的重要性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验