BioMedIA, Imperial College London, London, United Kingdom.
Laboratory for Ophthalmic Image Analysis, Medical University of Vienna, Vienna, Austria.
Med Image Anal. 2024 Oct;97:103296. doi: 10.1016/j.media.2024.103296. Epub 2024 Aug 10.
Deep learning has potential to automate screening, monitoring and grading of disease in medical images. Pretraining with contrastive learning enables models to extract robust and generalisable features from natural image datasets, facilitating label-efficient downstream image analysis. However, the direct application of conventional contrastive methods to medical datasets introduces two domain-specific issues. Firstly, several image transformations which have been shown to be crucial for effective contrastive learning do not translate from the natural image to the medical image domain. Secondly, the assumption made by conventional methods, that any two images are dissimilar, is systematically misleading in medical datasets depicting the same anatomy and disease. This is exacerbated in longitudinal image datasets that repeatedly image the same patient cohort to monitor their disease progression over time. In this paper we tackle these issues by extending conventional contrastive frameworks with a novel metadata-enhanced strategy. Our approach employs widely available patient metadata to approximate the true set of inter-image contrastive relationships. To this end we employ records for patient identity, eye position (i.e. left or right) and time series information. In experiments using two large longitudinal datasets containing 170,427 retinal optical coherence tomography (OCT) images of 7912 patients with age-related macular degeneration (AMD), we evaluate the utility of using metadata to incorporate the temporal dynamics of disease progression into pretraining. Our metadata-enhanced approach outperforms both standard contrastive methods and a retinal image foundation model in five out of six image-level downstream tasks related to AMD. We find benefits in both a low-data and high-data regime across tasks ranging from AMD stage and type classification to prediction of visual acuity. Due to its modularity, our method can be quickly and cost-effectively tested to establish the potential benefits of including available metadata in contrastive pretraining.
深度学习有可能实现医学图像中疾病的自动筛查、监测和分级。通过对比学习进行预训练,使模型能够从自然图像数据集中提取出强大且可泛化的特征,从而促进标签高效的下游图像分析。然而,传统对比方法在直接应用于医学数据集时会引入两个特定于领域的问题。首先,已经证明对有效对比学习至关重要的几种图像转换在从自然图像到医学图像领域的转换中无法进行。其次,传统方法所做的假设,即任何两幅图像都是不同的,在医学数据集中系统地误导了描绘相同解剖结构和疾病的图像。在随时间重复对同一患者队列进行成像以监测其疾病进展的纵向图像数据集中,这种情况会加剧。在本文中,我们通过使用一种新颖的元数据增强策略扩展传统的对比框架来解决这些问题。我们的方法使用广泛可用的患者元数据来近似真实的图像间对比关系。为此,我们使用患者身份、眼睛位置(即左眼或右眼)和时间序列信息的记录。在使用包含 7912 名年龄相关性黄斑变性(AMD)患者的 170427 张视网膜光学相干断层扫描(OCT)图像的两个大型纵向数据集进行的实验中,我们评估了使用元数据将疾病进展的时间动态纳入预训练中的效用。在与 AMD 相关的六个图像级下游任务中的五个任务中,我们的元数据增强方法在标准对比方法和视网膜图像基础模型的基础上都取得了更好的效果。我们在从 AMD 阶段和类型分类到预测视力的任务中都发现了低数据和高数据环境下的优势。由于其模块化,我们的方法可以快速且具有成本效益地进行测试,以确定在对比预训练中包含可用元数据的潜在好处。