Department of Radiology, University of Pittsburgh, 3362 Fifth Avenue, Room 128, Pittsburgh, PA 15213, USA.
Acad Radiol. 2010 Nov;17(11):1401-8. doi: 10.1016/j.acra.2010.06.009. Epub 2010 Jul 22.
Lesion conspicuity is typically highly correlated with visual difficulty for lesion detection, and computer-aided detection (CAD) has been widely used as a "second reader" in mammography. Hence, increasing CAD sensitivity in detecting subtle cancers without increasing false-positive rates is important. The aim of this study was to investigate the effect of training database case selection on CAD performance in detecting low-conspicuity breast masses.
A full-field digital mammographic image database that included 525 cases depicting malignant masses was randomly partitioned into three subsets. A CAD scheme was applied to detect all initially suspected mass regions and compute region conspicuity. Training samples were iteratively selected from two of the subsets. Four types of training data sets-(1) one including all available true-positive mass regions in the two subsets ("all"), (2) one including 350 randomly selected mass regions ("diverse"), (3) one including 350 high-conspicuity mass regions ("easy"), and (4) one including 350 low-conspicuity mass regions ("difficult")-were assembled. In each training data set, the same number of randomly selected false-positive regions as the true-positives were also included. Two classifiers, an artificial neural network (ANN) and a k-nearest neighbor (KNN) algorithm, were trained using each of the four training data sets and tested on all suspected regions in the remaining data set. Using a threefold cross-validation method, the performance changes of the CAD schemes trained using one of the four training data sets were computed and compared.
CAD initially detected 1025 true-positive mass regions depicted on 507 cases (97% case-based sensitivity) and 9569 false-positive regions (3.5 per image) in the entire database. Using the all training data set, CAD achieved the highest overall performance on the entire testing database. However, CAD detected the highest number of low-conspicuity masses when the difficult training data set was used for training. Results did agree for both ANN-based and KNN-based classifiers in all tests. Compared to the use of the all training data set, the sensitivity of the schemes trained using the difficult data set decreased by 8.6% and 8.4% for the ANN and KNN algorithm on the entire database, respectively, but the detection of low-conspicuity masses increased by 7.1% and 15.1% for the ANN and KNN algorithm at a false-positive rate of 0.3 per image.
CAD performance depends on the size, diversity, and difficulty level of the training database. To increase CAD sensitivity in detecting subtle cancer, one should increase the fraction of difficult cases in the training database rather than simply increasing the training data set size.
病灶的显著度通常与病灶检出的视觉难度高度相关,计算机辅助检测(CAD)已被广泛用作乳腺 X 线摄影的“第二读片者”。因此,在不增加假阳性率的情况下提高 CAD 检测隐匿性小癌症的敏感性非常重要。本研究旨在探讨训练数据库中病例选择对检测低显著度乳腺肿块的 CAD 性能的影响。
使用全视野数字化乳腺 X 线摄影图像数据库,该数据库包含 525 例恶性肿块病例,将其随机分为三个子集。应用 CAD 方案检测所有最初可疑的肿块区域并计算区域显著度。从两个子集中迭代选择训练样本。共构建了 4 种类型的训练数据集:(1)包含两个子集中所有可用的真阳性肿块区域的数据集(“全部”),(2)包含 350 个随机选择的肿块区域的数据集(“多样”),(3)包含 350 个高显著度肿块区域的数据集(“容易”),以及(4)包含 350 个低显著度肿块区域的数据集(“困难”)。在每个训练数据集中,也随机选择了与真阳性相同数量的假阳性区域。使用两种分类器,人工神经网络(ANN)和 K 近邻(KNN)算法,分别使用这 4 种训练数据集进行训练,并在剩余数据集中的所有可疑区域进行测试。使用三折交叉验证方法,计算并比较了使用这 4 种训练数据集之一训练的 CAD 方案的性能变化。
CAD 最初在整个数据库中检测到 507 例中的 1025 个真阳性肿块区域(基于病例的敏感性为 97%)和 9569 个假阳性区域(每张图像 3.5 个)。使用全部训练数据集,CAD 在整个测试数据库中获得了最高的整体性能。然而,当使用困难训练数据集进行训练时,CAD 检测到的低显著度肿块数量最高。在所有测试中,基于 ANN 和 KNN 的分类器的结果均一致。与使用全部训练数据集相比,方案在整个数据库中的敏感性分别降低了 8.6%和 8.4%,而使用困难数据集训练时,ANN 和 KNN 算法的假阳性率为 0.3 时,低显著度肿块的检出率分别增加了 7.1%和 15.1%。
CAD 的性能取决于训练数据库的大小、多样性和难度水平。为了提高 CAD 检测隐匿性小癌症的敏感性,应增加训练数据库中困难病例的比例,而不是简单地增加训练数据集的大小。