Sahiner B, Chan H P, Petrick N, Helvie M A, Goodsitt M M
Department of Radiology, University of Michigan, Ann Arbor 48109-0904, USA.
Phys Med Biol. 1998 Oct;43(10):2853-71. doi: 10.1088/0031-9155/43/10/014.
A genetic algorithm (GA) based feature selection method was developed for the design of high-sensitivity classifiers, which were tailored to yield high sensitivity with high specificity. The fitness function of the GA was based on the receiver operating characteristic (ROC) partial area index, which is defined as the average specificity above a given sensitivity threshold. The designed GA evolved towards the selection of feature combinations which yielded high specificity in the high-sensitivity region of the ROC curve, regardless of the performance at low sensitivity. This is a desirable quality of a classifier used for breast lesion characterization, since the focus in breast lesion characterization is to diagnose correctly as many benign lesions as possible without missing malignancies. The high-sensitivity classifier, formulated as the Fisher's linear discriminant using GA-selected feature variables, was employed to classify 255 biopsy-proven mammographic masses as malignant or benign. The mammograms were digitized at a pixel size of 0.1 mm x 0.1 mm, and regions of interest (ROIs) containing the biopsied masses were extracted by an experienced radiologist. A recently developed image transformation technique, referred to as the rubber-band straightening transform, was applied to the ROIs. Texture features extracted from the spatial grey-level dependence and run-length statistics matrices of the transformed ROIs were used to distinguish malignant and benign masses. The classification accuracy of the high-sensitivity classifier was compared with that of linear discriminant analysis with stepwise feature selection (LDAsfs). With proper GA training, the ROC partial area of the high-sensitivity classifier above a true-positive fraction of 0.95 was significantly larger than that of LDAsfs, although the latter provided a higher total area (Az) under the ROC curve. By setting an appropriate decision threshold, the high-sensitivity classifier and LDAsfs correctly identified 61% and 34% of the benign masses respectively without missing any malignant masses. Our results show that the choice of the feature selection technique is important in computer-aided diagnosis, and that the GA may be a useful tool for designing classifiers for lesion characterization.
为设计高灵敏度分类器,开发了一种基于遗传算法(GA)的特征选择方法,该分类器旨在在高特异性的情况下实现高灵敏度。GA的适应度函数基于接收者操作特征(ROC)部分面积指数,该指数定义为给定灵敏度阈值之上的平均特异性。所设计的GA朝着选择在ROC曲线高灵敏度区域产生高特异性的特征组合进化,而不考虑低灵敏度下的性能。这是用于乳腺病变特征描述的分类器的理想特性,因为乳腺病变特征描述的重点是在不遗漏恶性肿瘤的情况下尽可能正确地诊断出许多良性病变。使用GA选择的特征变量将高灵敏度分类器公式化为Fisher线性判别式,用于将255个经活检证实的乳腺X线摄影肿块分类为恶性或良性。乳腺X线照片以0.1毫米×0.1毫米的像素大小进行数字化,由经验丰富的放射科医生提取包含活检肿块的感兴趣区域(ROI)。一种最近开发的图像变换技术,称为橡皮筋拉直变换,应用于ROI。从变换后的ROI的空间灰度依赖和游程长度统计矩阵中提取的纹理特征用于区分恶性和良性肿块。将高灵敏度分类器的分类准确率与具有逐步特征选择的线性判别分析(LDAsfs)的准确率进行比较。经过适当的GA训练,高灵敏度分类器在真阳性率高于0.95时的ROC部分面积显著大于LDAsfs,尽管后者在ROC曲线下提供了更高的总面积(Az)。通过设置适当的决策阈值,高灵敏度分类器和LDAsfs分别正确识别了61%和34%的良性肿块,且没有遗漏任何恶性肿块。我们的结果表明,特征选择技术的选择在计算机辅助诊断中很重要,并且GA可能是设计用于病变特征描述的分类器的有用工具。