Suppr超能文献

PathBench:提升用于病理图像理解的大型多模态模型在切片和全切片水平上的基准。

PathBench: Advancing the Benchmark of Large Multimodal Models for Pathology Image Understanding at Patch and Whole Slide Level.

作者信息

Sun Yuxuan, Wu Hao, Zhu Chenglu, Si Yixuan, Chen Qizi, Zhang Yunlong, Zhang Kai, Li Jingxiong, Cai Jiatong, Wang Yuhan, Sun Lin, Lin Tao, Yang Lin

出版信息

IEEE Trans Med Imaging. 2025 Jul 2;PP. doi: 10.1109/TMI.2025.3584857.

Abstract

Rapid advancements in large multimodal models (LMMs) have significantly enhanced their applications in pathology, particularly in image classification, pathology image description, and whole slide image (WSI) classification. In pathology, WSIs represent gigapixel-scale images composed of thousands of image patches. Therefore, both patch-level and WSI-level evaluations are essential and inherently interconnected for assessing LMM capabilities. In this work, we propose PathBench, which comprises three subsets at both patch and WSI levels, to refine and enhance the validation of LMMs. At the patch-level, evaluations using existing multi-choice Q&A datasets reveal that some LMMs can predict answers without genuine image analysis. To address this, we introduce PatchVQA, a large-scale visual question answering (VQA) dataset containing 5,382 images and 6,335 multiple-choice questions designed with distractor options to prevent shortcut learning. These new questions are rigorously validated by professional pathologists to ensure reliable model assessments. At the WSI-level, current efforts primarily focus on image classification tasks and lack diverse validation datasets for multimodal models. To address this, we generate a detailed WSI report dataset through an innovative approach that integrates detailed patch descriptions generated by foundational models into comprehensive WSI reports. These are then combined with physician-written reports corresponding to TCGA WSIs, resulting in WSICap, a detailed report dataset containing 7,000 samples. Based on WSICap, we further develop a WSI-level VQA dataset, WSIVQA, to serve as a validation set for WSI LMMs. Using these PathBench subsets, we conduct extensive experiments to benchmark the performance of state-of-the-art LMMs at both the patch and WSI levels. The proposed dataset is available at https://github.com/superjamessyx/PathBench.

摘要

大型多模态模型(LMM)的快速发展显著增强了其在病理学中的应用,特别是在图像分类、病理图像描述和全切片图像(WSI)分类方面。在病理学中,WSI代表由数千个图像块组成的千兆像素级图像。因此,对于评估LMM的能力而言,块级和WSI级评估都至关重要且内在相互关联。在这项工作中,我们提出了PathBench,它在块级和WSI级都包含三个子集,以完善和加强对LMM的验证。在块级,使用现有的多项选择问答数据集进行评估发现,一些LMM可以在没有真正图像分析的情况下预测答案。为了解决这个问题,我们引入了PatchVQA,这是一个大规模视觉问答(VQA)数据集,包含5382张图像和6335个多项选择题,设计了干扰选项以防止捷径学习。这些新问题经过专业病理学家的严格验证,以确保可靠的模型评估。在WSI级,当前的工作主要集中在图像分类任务上,并且缺乏用于多模态模型的多样化验证数据集。为了解决这个问题,我们通过一种创新方法生成了一个详细的WSI报告数据集,该方法将基础模型生成的详细块描述集成到全面的WSI报告中。然后将这些与对应于TCGA WSI的医生撰写的报告相结合,得到WSICap,这是一个包含7000个样本的详细报告数据集。基于WSICap,我们进一步开发了一个WSI级VQA数据集WSIVQA,作为WSI LMM的验证集。使用这些PathBench子集,我们进行了广泛的实验,以在块级和WSI级对最先进的LMM的性能进行基准测试。所提出的数据集可在https://github.com/superjamessyx/PathBench上获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验