Wang Shijun, Yao Jianhua, Summers Ronald M
Diagnostic Radiology Department, National Institutes of Health Clinical Center, Building 10, Bethesda, Maryland 20892-1182, USA.
Med Phys. 2008 Apr;35(4):1377-86. doi: 10.1118/1.2870218.
Computer-aided detection (CAD) has been shown to be feasible for polyp detection on computed tomography (CT) scans. After initial detection, the dataset of colonic polyp candidates has large-scale and high dimensional characteristics. In this article, we propose a nonlinear dimensionality reduction method based on diffusion map and locally linear embedding (DMLLE) for large-scale datasets. By selecting partial data as landmarks, we first map these points into a low dimensional embedding space using the diffusion map. The embedded landmarks can be viewed as a skeleton of whole data in the low dimensional space. Then by using the locally linear embedding algorithm, nonlandmark samples are mapped into the same low dimensional space according to their nearest landmark samples. The local geometry is preserved in both the original high dimensional space and the embedding space. In addition, DMLLE provides a faithful representation of the original high dimensional data at coarse and fine scales. Thus, it can capture the intrinsic distance relationship between samples and reduce the influence of noisy features, two aspects that are crucial to achieving high classifier performance. We applied the proposed DMLLE method to a colonic polyp dataset of 175 269 polyp candidates with 155 features. Visual inspection shows that true polyps with similar shapes are mapped to close vicinity in the low dimensional space. We compared the performance of a support vector machine (SVM) classifier in the low dimensional embedding space with that in the original high dimensional space, SVM with principal component analysis dimensionality reduction and SVM committee using feature selection technology. Free-response receiver operating characteristic analysis shows that by using our DMLLE dimensionality reduction method, SVM achieves higher sensitivity with a lower false positive rate compared with other methods. For 6-9 mm polyps (193 true polyps contained in test set), when the number of false positives per patient is 9, SVM with DMLLE improves the average sensitivity from 70% to 83% compared with that of an SVM committee classifier which is a state-of-the-art method for colonic polyp detection (p<0.001).
计算机辅助检测(CAD)已被证明在计算机断层扫描(CT)图像上检测息肉是可行的。在初始检测之后,结肠息肉候选数据集具有大规模和高维的特征。在本文中,我们针对大规模数据集提出了一种基于扩散映射和局部线性嵌入(DMLLE)的非线性降维方法。通过选择部分数据作为地标,我们首先使用扩散映射将这些点映射到低维嵌入空间。嵌入的地标可以被视为低维空间中整个数据的骨架。然后通过使用局部线性嵌入算法,非地标样本根据其最近的地标样本被映射到相同的低维空间。局部几何结构在原始高维空间和嵌入空间中都得以保留。此外,DMLLE在粗粒度和细粒度上都提供了原始高维数据的忠实表示。因此,它可以捕捉样本之间的内在距离关系,并减少噪声特征的影响,这两个方面对于实现高分类器性能至关重要。我们将所提出的DMLLE方法应用于一个包含175269个具有155个特征的息肉候选的结肠息肉数据集。目视检查表明,形状相似的真实息肉在低维空间中被映射到相近的位置。我们将支持向量机(SVM)分类器在低维嵌入空间中的性能与在原始高维空间中的性能、采用主成分分析降维的SVM以及使用特征选择技术的SVM委员会进行了比较。自由响应接收器操作特性分析表明,与其他方法相比,通过使用我们的DMLLE降维方法,SVM在较低的误报率下实现了更高的灵敏度。对于6 - 9毫米的息肉(测试集中包含193个真实息肉),当每位患者的误报数为9时,与作为结肠息肉检测的先进方法的SVM委员会分类器相比,采用DMLLE的SVM将平均灵敏度从70%提高到了83%(p<0.001)。