文献检索，用中文搜 PubMed

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Liu Jialin, Liu Siru

Department of Otolaryngology-Head and Neck Surgery, West China Hospital, Sichuan University, Chengdu, China.

Department of Medical Informatics, West China Hospital, Sichuan University, Chengdu, China.

J Med Syst. 2025 Jul 28;49(1):100. doi: 10.1007/s10916-025-02232-w.

HealthBench is an open-source, large-scale benchmark consisting of 5,000 multi-turn clinical conversations evaluated against 48,562 criteria developed by clinicians. Recognized as a significant advancement in assessing realistic artificial intelligence (AI) models, HealthBench deserves further exploration. In this article, we systematically analyze the benchmark's disease spectrum, diagnostic and therapeutic focuses, and demographic diversity. We evaluate its representativeness and strengths, as well as the essential limitations that AI researchers and clinicians should consider when using it for realistic model evaluations.

HealthBench是一个开源的大规模基准测试，由5000个多轮临床对话组成，这些对话依据临床医生制定的48562条标准进行评估。作为评估现实人工智能（AI）模型的一项重大进展，HealthBench值得进一步探索。在本文中，我们系统地分析了该基准测试的疾病谱、诊断和治疗重点以及人口统计学多样性。我们评估了它的代表性和优势，以及人工智能研究人员和临床医生在将其用于现实模型评估时应考虑的基本局限性。

Liu Jialin, Liu Siru

Department of Otolaryngology-Head and Neck Surgery, West China Hospital, Sichuan University, Chengdu, China.

Department of Medical Informatics, West China Hospital, Sichuan University, Chengdu, China.

J Med Syst. 2025 Jul 28;49(1):100. doi: 10.1007/s10916-025-02232-w.

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

剖析HealthBench：多轮临床人工智能评估基准中的疾病谱、临床多样性和数据洞察

Dissecting HealthBench: Disease Spectrum, Clinical Diversity, and Data Insights from Multi-Turn Clinical AI Evaluation Benchmark.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

剖析HealthBench：多轮临床人工智能评估基准中的疾病谱、临床多样性和数据洞察

Dissecting HealthBench: Disease Spectrum, Clinical Diversity, and Data Insights from Multi-Turn Clinical AI Evaluation Benchmark.

作者信息

机构信息

出版信息

相似文献

本文引用的文献