探讨基于拉普拉斯特征映射和 t-SNE 的乳腺 CADx 非线性特征空间降维和数据表示。

Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE.

机构信息

Department of Radiology, University of Chicago, Chicago, Illinois 60637, USA.

出版信息

Med Phys. 2010 Jan;37(1):339-51. doi: 10.1118/1.3267037.

Abstract

PURPOSE

In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation," Neural Comput. 15, 1373-1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008)].

METHODS

These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier's AUC performance.

RESULTS

In the large U.S. data set, sample high performance results include, AUC0.632+ = 0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+ = 0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+ = 0.90 with interval [0.847;0.919], all using the MCMC-BANN.

CONCLUSIONS

Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space.

摘要

目的

在这项初步研究中,我们应用了最近开发的无监督非线性降维和数据表示技术,对三种不同的成像模式下的计算机提取的乳腺病变特征空间进行分析:超声(US)共 1126 例,动态对比增强磁共振成像(DCE-MRI)共 356 例,全数字化乳腺摄影(FFDM)共 245 例。我们探索了两种非线性降维方法:拉普拉斯特征映射[M. Belkin 和 P. Niyogi,“用于降维和数据表示的拉普拉斯特征映射”,《神经计算》15,1373-1396(2003 年)]和 t 分布随机近邻嵌入(t-SNE)[L. van der Maaten 和 G. Hinton,“使用 t-SNE 进行可视化”,《机器学习研究杂志》9,2579-2605(2008 年)]。

方法

这些方法试图将原始的高维特征空间映射到更具人类可解释性的低维空间,同时保留局部和全局信息。在恶性分类性能的背景下,以及在二维和三维映射的稀疏性的可视化检查中,评估了这些方法在乳腺计算机辅助诊断(CADx)中的性能。通过将降维映射的特征输出作为输入,分别使用线性和非线性分类器:基于马尔可夫链蒙特卡罗的贝叶斯人工神经网络(MCMC-BANN)和线性判别分析来估计分类性能。与以前开发的乳腺 CADx 方法进行比较,包括自动相关性确定和线性逐步(LSW)特征选择,以及基于主成分分析的线性降维方法。使用 ROC 分析和 0.632+bootstrap 验证,为每个分类器的 AUC 性能计算了 95%经验置信区间。

结果

在大型 US 数据集中,样本高性能结果包括,使用 MCMC-BANN 时,13 个 ARD 选择特征的 AUC0.632+ = 0.88,95%经验 bootstrap 区间为[0.787;0.895],4 个 LSW 选择特征的 AUC0.632+ = 0.87,区间为[0.817;0.906],而 4D t-SNE 映射(从原始 81D 特征空间)的 AUC0.632+ = 0.90,区间为[0.847;0.919]。

结论

初步结果似乎表明,这些新方法能够匹配或超过当前先进的乳腺病变 CADx 算法的分类性能。虽然不能作为 CADx 问题中特征选择的完全替代方法,但降维技术提供了一种补充方法,可以帮助阐明与数据相关的其他特性。具体来说,新方法还具有提供稀疏低维表示以供视觉解释的额外优势,揭示了特征空间的复杂数据结构。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索