Suppr超能文献

用于颈椎病的大语言模型基准测试

Benchmarking Large Language Models for Cervical Spondylosis.

作者信息

Zhang Boyan, Du Yueqi, Duan Wanru, Chen Zan

机构信息

Xuanwu Hospital, Capital Medical University, Beijing, China.

Lab of Spinal Cord Injury and Functional Reconstruction, China International Neuroscience Institute, Beijing, China.

出版信息

JMIR Form Res. 2024 Aug 5;8:e55577. doi: 10.2196/55577.

Abstract

Cervical spondylosis is the most common degenerative spinal disorder in modern societies. Patients require a great deal of medical knowledge, and large language models (LLMs) offer patients a novel and convenient tool for accessing medical advice. In this study, we collected the most frequently asked questions by patients with cervical spondylosis in clinical work and internet consultations. The accuracy of the answers provided by LLMs was evaluated and graded by 3 experienced spinal surgeons. Comparative analysis of responses showed that all LLMs could provide satisfactory results, and that among them, GPT-4 had the highest accuracy rate. Variation across each section in all LLMs revealed their ability boundaries and the development direction of artificial intelligence.

摘要

颈椎病是现代社会中最常见的脊柱退行性疾病。患者需要大量的医学知识,而大语言模型为患者提供了一种获取医疗建议的新颖且便捷的工具。在本研究中,我们收集了颈椎病患者在临床工作和互联网咨询中最常提出的问题。由3位经验丰富的脊柱外科医生对大语言模型提供的答案的准确性进行评估和分级。对回答的比较分析表明,所有大语言模型都能提供令人满意的结果,其中GPT-4的准确率最高。所有大语言模型各部分的差异揭示了它们的能力边界和人工智能的发展方向。

相似文献

本文引用的文献

4
9

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验