School of Electronics and Computer Science, University of Southampton, Southampton, United Kingdom.
PLoS One. 2024 Nov 6;19(11):e0309368. doi: 10.1371/journal.pone.0309368. eCollection 2024.
Unlike in the field of visual scene recognition, where tremendous advances have taken place due to the availability of very large datasets to train deep neural networks, inference from medical images is often hampered by the fact that only small amounts of data may be available. When working with very small dataset problems, of the order of a few hundred items of data, the power of deep learning may still be exploited by using a pre-trained model as a feature extractor and carrying out classic pattern recognition techniques in this feature space, the so-called few-shot learning problem. However, medical images are highly complex and variable, making it difficult for few-shot learning to fully capture and model these features. To address these issues, we focus on the intrinsic characteristics of the data. We find that, in regimes where the dimension of the feature space is comparable to or even larger than the number of images in the data, dimensionality reduction is a necessity and is often achieved by principal component analysis or singular value decomposition (PCA/SVD). In this paper, noting the inappropriateness of using SVD for this setting we explore two alternatives based on discriminant analysis (DA) and non-negative matrix factorization (NMF). Using 14 different datasets spanning 11 distinct disease types we demonstrate that at low dimensions, discriminant subspaces achieve significant improvements over SVD-based subspaces and the original feature space. We also show that at modest dimensions, NMF is a competitive alternative to SVD in this setting. The implementation of the proposed method is accessible via the following link.
与视觉场景识别领域不同,由于有大量数据集可用于训练深度神经网络,因此在该领域取得了巨大进展。相比之下,医学图像的推断通常受到可用数据量较少的限制。当处理非常小的数据集问题(几百个数据项左右)时,深度学习的力量仍然可以通过使用预训练模型作为特征提取器,并在这个特征空间中进行经典的模式识别技术(所谓的Few-Shot 学习问题)来利用。然而,医学图像非常复杂且多变,这使得 Few-Shot 学习难以充分捕捉和建模这些特征。为了解决这些问题,我们专注于数据的内在特征。我们发现,在特征空间的维度与数据中的图像数量相当或甚至更大的情况下,降维是必要的,通常通过主成分分析或奇异值分解(PCA/SVD)来实现。在本文中,我们注意到在这种情况下使用 SVD 不适用,因此我们探索了两种基于判别分析(DA)和非负矩阵分解(NMF)的替代方法。我们使用跨越 11 种不同疾病类型的 14 个不同数据集证明,在低维情况下,判别子空间相对于基于 SVD 的子空间和原始特征空间有显著的改进。我们还表明,在适度的维度下,NMF 在这种情况下是 SVD 的一种有竞争力的替代方法。该方法的实现可通过以下链接访问。