• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Quilt-1M:用于组织病理学的一百万图像-文本对

Quilt-1M: One Million Image-Text Pairs for Histopathology.

作者信息

Ikezogwo Wisdom O, Seyfioglu Mehmet S, Ghezloo Fatemeh, Geva Dylan, Mohammed Fatwir S, Anand Pavan K, Krishna Ranjay, Shapiro Linda G

机构信息

University of Washington.

出版信息

Adv Neural Inf Process Syst. 2023 Dec;36(DB1):37995-38017.

PMID:38742142
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11090501/
Abstract

Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of analogous data in the medical field, specifically in histopathology, has halted comparable progress. To enable similar representation learning for histopathology, we turn to YouTube, an untapped resource of videos, offering 1,087 hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of 768,826 image and text pairs. Quilt was automatically curated using a mixture of models, including large language models, handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around 200K samples. We combine Quilt with datasets from other sources, including Twitter, research papers, and the internet in general, to create an even larger dataset: Quilt-1M, with 1M paired image-text samples, marking it as the largest vision-language histopathology dataset to date. We demonstrate the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model outperforms state-of-the-art models on both zero-shot and linear probing tasks for classifying new histopathology images across 13 diverse patch-level datasets of 8 different sub-pathologies and cross-modal retrieval tasks.

摘要

近期,由于网上有大量的图像和文本数据,多模态应用得以加速发展。然而,医学领域,特别是组织病理学领域类似数据的匮乏,阻碍了类似的进展。为了在组织病理学中实现类似的表征学习,我们转向了YouTube,这是一个尚未开发的视频资源库,提供了来自专家临床医生的1087小时有价值的组织病理学教育视频。我们从YouTube中精心挑选了Quilt:一个由768,826个图像与文本对组成的大规模视觉语言数据集。Quilt是使用多种模型自动挑选出来的,这些模型包括大语言模型、手工制作的算法、人类知识数据库和自动语音识别。相比之下,为组织病理学精心挑选的最全面的数据集也只收集了大约20万个样本。我们将Quilt与来自其他来源的数据集相结合,包括Twitter、研究论文以及整个互联网,以创建一个更大的数据集:Quilt-1M,它有100万个配对的图像-文本样本,这使其成为迄今为止最大的视觉语言组织病理学数据集。我们通过微调一个预训练的CLIP模型来展示Quilt-1M的价值。在对来自8种不同子病理学的13个不同补丁级数据集进行新的组织病理学图像分类的零样本和线性探测任务以及跨模态检索任务中,我们的模型优于现有最先进的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5384/11090501/fa66f0e100f3/nihms-1938476-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5384/11090501/141e40245e1d/nihms-1938476-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5384/11090501/9fd049926151/nihms-1938476-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5384/11090501/fa66f0e100f3/nihms-1938476-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5384/11090501/141e40245e1d/nihms-1938476-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5384/11090501/9fd049926151/nihms-1938476-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5384/11090501/fa66f0e100f3/nihms-1938476-f0003.jpg

相似文献

1
Quilt-1M: One Million Image-Text Pairs for Histopathology.Quilt-1M:用于组织病理学的一百万图像-文本对
Adv Neural Inf Process Syst. 2023 Dec;36(DB1):37995-38017.
2
Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders.通过微调预训练的图像-文本编码器,显著提高零样本 X 射线病理学分类。
Sci Rep. 2024 Oct 5;14(1):23199. doi: 10.1038/s41598-024-73695-z.
3
A visual-language foundation model for computational pathology.用于计算病理学的视觉-语言基础模型。
Nat Med. 2024 Mar;30(3):863-874. doi: 10.1038/s41591-024-02856-4. Epub 2024 Mar 19.
4
A visual-language foundation model for pathology image analysis using medical Twitter.一种使用医学推特进行病理学图像分析的视觉语言基础模型。
Nat Med. 2023 Sep;29(9):2307-2316. doi: 10.1038/s41591-023-02504-3. Epub 2023 Aug 17.
5
LiverNet: efficient and robust deep learning model for automatic diagnosis of sub-types of liver hepatocellular carcinoma cancer from H&E stained liver histopathology images.LiverNet:一种高效、稳健的深度学习模型,用于从 H&E 染色的肝脏组织病理学图像中自动诊断肝肝细胞癌亚型。
Int J Comput Assist Radiol Surg. 2021 Sep;16(9):1549-1563. doi: 10.1007/s11548-021-02410-4. Epub 2021 May 30.
6
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text.MedCLIP:从未配对医学图像和文本中进行对比学习。
Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:3876-3887. doi: 10.18653/v1/2022.emnlp-main.256.
7
MCPL: Multi-Modal Collaborative Prompt Learning for Medical Vision-Language Model.MCPL:用于医学视觉语言模型的多模态协作提示学习
IEEE Trans Med Imaging. 2024 Dec;43(12):4224-4235. doi: 10.1109/TMI.2024.3418408. Epub 2024 Dec 2.
8
A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations.一种用于具有有限标注的未配对多模态医学图像分割的模态协作卷积与Transformer混合网络。
Med Phys. 2023 Sep;50(9):5460-5478. doi: 10.1002/mp.16338. Epub 2023 Mar 15.
9
Semi-supervised training of deep convolutional neural networks with heterogeneous data and few local annotations: An experiment on prostate histopathology image classification.基于异构数据和少量局部标注的深度卷积神经网络的半监督学习:前列腺组织病理学图像分类实验。
Med Image Anal. 2021 Oct;73:102165. doi: 10.1016/j.media.2021.102165. Epub 2021 Jul 14.
10
A Foundation Language-Image Model of the Retina (FLAIR): encoding expert knowledge in text supervision.视网膜的基础语言-图像模型(FLAIR):在文本监督中编码专家知识。
Med Image Anal. 2025 Jan;99:103357. doi: 10.1016/j.media.2024.103357. Epub 2024 Oct 1.

引用本文的文献

1
From large language models to multimodal AI: a scoping review on the potential of generative AI in medicine.从大语言模型到多模态人工智能:关于生成式人工智能在医学领域潜力的范围综述
Biomed Eng Lett. 2025 Aug 22;15(5):845-863. doi: 10.1007/s13534-025-00497-1. eCollection 2025 Sep.
2
Multimodal integration strategies for clinical application in oncology.肿瘤学临床应用中的多模态整合策略
Front Pharmacol. 2025 Aug 20;16:1609079. doi: 10.3389/fphar.2025.1609079. eCollection 2025.
3
HistoChat: Instruction-tuning multimodal vision language assistant for colorectal histopathology on limited data.

本文引用的文献

1
Ontology of Consumer Health Vocabulary: providing a formal and interoperable semantic resource for linking lay language and medical terminology.消费者健康词汇本体:提供一个用于连接日常语言和医学术语的形式化且可互操作的语义资源。
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2019 Nov;2019:1177-1178. doi: 10.1109/bibm47256.2019.8983220. Epub 2020 Feb 6.
2
A Multi-Stain Breast Cancer Histological Whole-Slide-Image Data Set from Routine Diagnostics.多染色乳腺癌组织学全切片图像数据集来自常规诊断。
Sci Data. 2023 Aug 24;10(1):562. doi: 10.1038/s41597-023-02422-6.
3
A visual-language foundation model for pathology image analysis using medical Twitter.
HistoChat:用于有限数据上的结直肠癌组织病理学的指令微调多模态视觉语言助手。
Patterns (N Y). 2025 May 30;6(8):101284. doi: 10.1016/j.patter.2025.101284. eCollection 2025 Aug 8.
4
Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning.通过原型跨模态对比学习进行大词汇量法医病理学分析
Nat Commun. 2025 Jul 23;16(1):6773. doi: 10.1038/s41467-025-62060-x.
5
An end-to-end multifunctional AI platform for intraoperative diagnosis.一种用于术中诊断的端到端多功能人工智能平台。
NPJ Digit Med. 2025 Jul 20;8(1):460. doi: 10.1038/s41746-025-01808-7.
6
Evaluating Vision and Pathology Foundation Models for Computational Pathology: A Comprehensive Benchmark Study.评估用于计算病理学的视觉与病理学基础模型:一项全面的基准研究
Res Sq. 2025 Jul 4:rs.3.rs-6823810. doi: 10.21203/rs.3.rs-6823810/v1.
7
PixCell: A generative foundation model for digital histopathology images.PixCell:一种用于数字组织病理学图像的生成基础模型。
ArXiv. 2025 Jun 5:arXiv:2506.05127v1.
8
Abnormality-aware multimodal learning for WSI classification.用于全切片图像分类的异常感知多模态学习
Front Med (Lausanne). 2025 Feb 25;12:1546452. doi: 10.3389/fmed.2025.1546452. eCollection 2025.
9
Machine learning methods for histopathological image analysis: Updates in 2024.用于组织病理学图像分析的机器学习方法:2024年的进展
Comput Struct Biotechnol J. 2024 Dec 30;27:383-400. doi: 10.1016/j.csbj.2024.12.033. eCollection 2025.
10
A vision-language foundation model for precision oncology.用于精准肿瘤学的视觉语言基础模型。
Nature. 2025 Feb;638(8051):769-778. doi: 10.1038/s41586-024-08378-w. Epub 2025 Jan 8.
一种使用医学推特进行病理学图像分析的视觉语言基础模型。
Nat Med. 2023 Sep;29(9):2307-2316. doi: 10.1038/s41591-023-02504-3. Epub 2023 Aug 17.
4
ChatGPT outperforms crowd workers for text-annotation tasks.在文本注释任务中,ChatGPT的表现优于众包工作者。
Proc Natl Acad Sci U S A. 2023 Jul 25;120(30):e2305016120. doi: 10.1073/pnas.2305016120. Epub 2023 Jul 18.
5
PathologyBERT - Pre-trained Vs. A New Transformer Language Model for Pathology Domain.PathologyBERT- 预训练与病理领域新的转换器语言模型的比较。
AMIA Annu Symp Proc. 2023 Apr 29;2022:962-971. eCollection 2022.
6
Deep learning for the detection of anatomical tissue structures and neoplasms of the skin on scanned histopathological tissue sections.用于在扫描的组织病理学切片上检测皮肤解剖组织结构和肿瘤的深度学习。
Front Oncol. 2022 Nov 22;12:1022967. doi: 10.3389/fonc.2022.1022967. eCollection 2022.
7
Statistical analysis of preclinical inter-species concordance of histopathological findings in the eTOX database.eTOX数据库中组织病理学发现的临床前种间一致性的统计分析。
Regul Toxicol Pharmacol. 2023 Feb;138:105308. doi: 10.1016/j.yrtph.2022.105308. Epub 2022 Dec 5.
8
Unleashing the potential of digital pathology data by training computer-aided diagnosis models without human annotations.通过训练无需人工标注的计算机辅助诊断模型来释放数字病理学数据的潜力。
NPJ Digit Med. 2022 Jul 22;5(1):102. doi: 10.1038/s41746-022-00635-4.
9
Scale-Aware Transformers for Diagnosing Melanocytic Lesions.用于诊断黑素细胞性病变的尺度感知变压器
IEEE Access. 2021;9:163526-163541. doi: 10.1109/ACCESS.2021.3132958. Epub 2021 Dec 6.
10
Going deeper through the Gleason scoring scale: An automatic end-to-end system for histology prostate grading and cribriform pattern detection.深入探究格里森评分系统:一种用于组织学前列腺分级和筛状模式检测的自动端到端系统。
Comput Methods Programs Biomed. 2020 Oct;195:105637. doi: 10.1016/j.cmpb.2020.105637. Epub 2020 Jul 4.