Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Republic of Korea.
Department of Biomedicine & Health Science, College of Medicine, The Catholic University of Korea, Republic of Korea.
Stud Health Technol Inform. 2024 Aug 22;316:552-553. doi: 10.3233/SHTI240473.
Previous studies have been limited to giving one or two tasks to Large Language Models (LLMs) and involved a small number of evaluators within a single domain to evaluate the LLM's answer. We assessed the proficiency of four LLMs by applying eight tasks and evaluating 32 results with 17 evaluators from diverse domains, demonstrating the significance of various tasks and evaluators on LLMs.
先前的研究仅限于向大型语言模型 (LLM) 提供一两个任务,并在单个领域内由少数评估者评估 LLM 的答案。我们通过应用八个任务并由来自不同领域的 17 名评估者评估 32 个结果来评估四个 LLM 的熟练程度,这表明各种任务和评估者对 LLM 的重要性。