• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

VisEval: A Benchmark for Data Visualization in the Era of Large Language Models.

作者信息

Chen Nan, Zhang Yuge, Xu Jiahang, Ren Kan, Yang Yuqing

出版信息

IEEE Trans Vis Comput Graph. 2025 Jan;31(1):1301-1311. doi: 10.1109/TVCG.2024.3456320. Epub 2024 Nov 25.

DOI:10.1109/TVCG.2024.3456320
PMID:39255134
Abstract

Translating natural language to visualization (NL2VIS) has shown great promise for visual data analysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language processing and visualization design. Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language. However, the lack of a comprehensive and reliable benchmark hinders our understanding of LLMs' capabilities in visualization generation. In this paper, we address this gap by proposing a new NL2VIS benchmark called VisEval. Firstly, we introduce a high-quality and large-scale dataset. This dataset includes 2,524 representative queries covering 146 databases, paired with accurately labeled ground truths. Secondly, we advocate for a comprehensive automated evaluation methodology covering multiple dimensions, including validity, legality, and readability. By systematically scanning for potential issues with a number of heterogeneous checkers, VisEval provides reliable and trustworthy evaluation outcomes. We run VisEval on a series of state-of-the-art LLMs. Our evaluation reveals prevalent challenges and delivers essential insights for future advancements.

摘要

相似文献

1
VisEval: A Benchmark for Data Visualization in the Era of Large Language Models.
IEEE Trans Vis Comput Graph. 2025 Jan;31(1):1301-1311. doi: 10.1109/TVCG.2024.3456320. Epub 2024 Nov 25.
2
Natural Language to Visualization by Neural Machine Translation.神经机器翻译的自然语言到可视化。
IEEE Trans Vis Comput Graph. 2022 Jan;28(1):217-226. doi: 10.1109/TVCG.2021.3114848. Epub 2021 Dec 24.
3
CARDBiomedBench: A Benchmark for Evaluating Large Language Model Performance in Biomedical Research: A novel question-and-answer benchmark designed to assess Large Language Models' comprehension of biomedical research, piloted on Neurodegenerative Diseases.CARDBiomedBench:评估大型语言模型在生物医学研究中性能的基准:一个旨在评估大型语言模型对生物医学研究理解能力的新型问答基准,已在神经退行性疾病领域进行试点。
bioRxiv. 2025 Jan 21:2025.01.15.633272. doi: 10.1101/2025.01.15.633272.
4
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.大型语言模型在变革急诊医学中的作用:范围综述
JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.
5
Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation.用于评估医学领域大语言模型回复的数据集和基准(MedGPTEval):评估开发与验证
JMIR Med Inform. 2024 Jun 28;12:e57674. doi: 10.2196/57674.
6
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
7
A Comprehensive Survey of ChatGPT: Advancements, Applications, Prospects, and Challenges.ChatGPT综合调查:进展、应用、前景与挑战
Meta Radiol. 2023 Sep;1(2). doi: 10.1016/j.metrad.2023.100022. Epub 2023 Oct 7.
8
Assessing and Optimizing Large Language Models on Spondyloarthritis Multi-Choice Question Answering: Protocol for Enhancement and Assessment.评估和优化用于脊柱关节炎多项选择题回答的大型语言模型:增强和评估的方案。
JMIR Res Protoc. 2024 May 24;13:e57001. doi: 10.2196/57001.
9
Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.评估生成式人工智能工具理解医学论文的能力:定性研究
JMIR Med Inform. 2024 Sep 4;12:e59258. doi: 10.2196/59258.
10
The Applications of Large Language Models in Mental Health: Scoping Review.大语言模型在心理健康领域的应用:范围综述
J Med Internet Res. 2025 May 5;27:e69284. doi: 10.2196/69284.