Ikezogwo Wisdom O, Seyfioglu Mehmet S, Ghezloo Fatemeh, Geva Dylan, Mohammed Fatwir S, Anand Pavan K, Krishna Ranjay, Shapiro Linda G
University of Washington.
Adv Neural Inf Process Syst. 2023 Dec;36(DB1):37995-38017.
Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of analogous data in the medical field, specifically in histopathology, has halted comparable progress. To enable similar representation learning for histopathology, we turn to YouTube, an untapped resource of videos, offering 1,087 hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of 768,826 image and text pairs. Quilt was automatically curated using a mixture of models, including large language models, handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around 200K samples. We combine Quilt with datasets from other sources, including Twitter, research papers, and the internet in general, to create an even larger dataset: Quilt-1M, with 1M paired image-text samples, marking it as the largest vision-language histopathology dataset to date. We demonstrate the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model outperforms state-of-the-art models on both zero-shot and linear probing tasks for classifying new histopathology images across 13 diverse patch-level datasets of 8 different sub-pathologies and cross-modal retrieval tasks.
近期,由于网上有大量的图像和文本数据,多模态应用得以加速发展。然而,医学领域,特别是组织病理学领域类似数据的匮乏,阻碍了类似的进展。为了在组织病理学中实现类似的表征学习,我们转向了YouTube,这是一个尚未开发的视频资源库,提供了来自专家临床医生的1087小时有价值的组织病理学教育视频。我们从YouTube中精心挑选了Quilt:一个由768,826个图像与文本对组成的大规模视觉语言数据集。Quilt是使用多种模型自动挑选出来的,这些模型包括大语言模型、手工制作的算法、人类知识数据库和自动语音识别。相比之下,为组织病理学精心挑选的最全面的数据集也只收集了大约20万个样本。我们将Quilt与来自其他来源的数据集相结合,包括Twitter、研究论文以及整个互联网,以创建一个更大的数据集:Quilt-1M,它有100万个配对的图像-文本样本,这使其成为迄今为止最大的视觉语言组织病理学数据集。我们通过微调一个预训练的CLIP模型来展示Quilt-1M的价值。在对来自8种不同子病理学的13个不同补丁级数据集进行新的组织病理学图像分类的零样本和线性探测任务以及跨模态检索任务中,我们的模型优于现有最先进的模型。