计算机辅助诊断中的特征选择与分类器性能：有限样本量的影响。

Feature selection and classifier performance in computer-aided diagnosis: the effect of finite sample size.

作者信息

Sahiner B, Chan H P, Petrick N, Wagner R F, Hadjiiski L

机构信息

Department of Radiology, University of Michigan, Ann Arbor 48109-0904, USA.

出版信息

Med Phys. 2000 Jul;27(7):1509-22. doi: 10.1118/1.599017.

DOI:10.1118/1.599017

PMID:10947254

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5713476/

Abstract

In computer-aided diagnosis (CAD), a frequently used approach for distinguishing normal and abnormal cases is first to extract potentially useful features for the classification task. Effective features are then selected from this entire pool of available features. Finally, a classifier is designed using the selected features. In this study, we investigated the effect of finite sample size on classification accuracy when classifier design involves stepwise feature selection in linear discriminant analysis, which is the most commonly used feature selection algorithm for linear classifiers. The feature selection and the classifier coefficient estimation steps were considered to be cascading stages in the classifier design process. We compared the performance of the classifier when feature selection was performed on the design samples alone and on the entire set of available samples, which consisted of design and test samples. The area Az under the receiver operating characteristic curve was used as our performance measure. After linear classifier coefficient estimation using the design samples, we studied the hold-out and resubstitution performance estimates. The two classes were assumed to have multidimensional Gaussian distributions, with a large number of features available for feature selection. We investigated the dependence of feature selection performance on the covariance matrices and means for the two classes, and examined the effects of sample size, number of available features, and parameters of stepwise feature selection on classifier bias. Our results indicated that the resubstitution estimate was always optimistically biased, except in cases where the parameters of stepwise feature selection were chosen such that too few features were selected by the stepwise procedure. When feature selection was performed using only the design samples, the hold-out estimate was always pessimistically biased. When feature selection was performed using the entire finite sample space, the hold-out estimates could be pessimistically or optimistically biased, depending on the number of features available for selection, the number of available samples, and their statistical distribution. For our simulation conditions, these estimates were always pessimistically (conservatively) biased if the ratio of the total number of available samples per class to the number of available features was greater than five.

摘要

在计算机辅助诊断（CAD）中，区分正常和异常病例的一种常用方法是首先为分类任务提取潜在有用的特征。然后从这一整套可用特征中选择有效特征。最后，使用所选特征设计一个分类器。在本研究中，我们调查了在分类器设计涉及线性判别分析中的逐步特征选择时有限样本量对分类准确率的影响，线性判别分析是线性分类器最常用的特征选择算法。特征选择和分类器系数估计步骤被视为分类器设计过程中的级联阶段。我们比较了仅在设计样本上以及在由设计样本和测试样本组成的整套可用样本上进行特征选择时分类器的性能。接收器操作特性曲线下的面积Az用作我们的性能度量。在使用设计样本估计线性分类器系数后，我们研究了留出法和再代入性能估计。假设两类具有多维高斯分布，有大量特征可用于特征选择。我们研究了特征选择性能对两类协方差矩阵和均值的依赖性，并检查了样本量、可用特征数量和逐步特征选择参数对分类器偏差的影响。我们的结果表明，再代入估计总是存在乐观偏差，除非逐步特征选择的参数选择使得逐步过程选择的特征过少。当仅使用设计样本进行特征选择时，留出法估计总是存在悲观偏差。当使用整个有限样本空间进行特征选择时，留出法估计可能存在悲观或乐观偏差，这取决于可供选择的特征数量、可用样本数量及其统计分布。对于我们的模拟条件，如果每类可用样本总数与可用特征数量之比大于5，这些估计总是存在悲观（保守）偏差。

相似文献

Feature selection and classifier performance in computer-aided diagnosis: the effect of finite sample size.

Med Phys. 2000 Jul;27(7):1509-22. doi: 10.1118/1.599017.

Classifier design for computer-aided diagnosis: effects of finite sample size on the mean performance of classical and neural network classifiers.

Med Phys. 1999 Dec;26(12):2654-68. doi: 10.1118/1.598805.

Effect of finite sample size on feature selection and classification: a simulation study.

Med Phys. 2010 Feb;37(2):907-20. doi: 10.1118/1.3284974.

Computerized analysis of mammographic microcalcifications in morphological and texture feature spaces.

Med Phys. 1998 Oct;25(10):2007-19. doi: 10.1118/1.598389.

Feature selection with limited datasets.

Med Phys. 1999 Oct;26(10):2176-82. doi: 10.1118/1.598821.

Fissures segmentation using surface features: content-based retrieval for mammographic mass using ensemble classifier.

Acad Radiol. 2011 Dec;18(12):1475-84. doi: 10.1016/j.acra.2011.08.012.

A novel hybrid linear/nonlinear classifier for two-class classification: theory, algorithm, and applications.

IEEE Trans Med Imaging. 2010 Feb;29(2):428-41. doi: 10.1109/TMI.2009.2033596. Epub 2009 Oct 9.

Image feature selection by a genetic algorithm: application to classification of mass and normal breast tissue.

Med Phys. 1996 Oct;23(10):1671-84. doi: 10.1118/1.597829.

Classifier performance prediction for computer-aided diagnosis using a limited dataset.

Med Phys. 2008 Apr;35(4):1559-70. doi: 10.1118/1.2868757.

Design of a high-sensitivity classifier based on a genetic algorithm: application to computer-aided diagnosis.

Phys Med Biol. 1998 Oct;43(10):2853-71. doi: 10.1088/0031-9155/43/10/014.

引用本文的文献

Optimal Training Positive Sample Size Determination for Deep Learning with a Validation on CBCT Image Caries Recognition.

Diagnostics (Basel). 2024 Sep 20;14(18):2080. doi: 10.3390/diagnostics14182080.

From data to decisions: AI and functional connectivity for diagnosis, prognosis, and recovery prediction in stroke.

Geroscience. 2025 Feb;47(1):977-992. doi: 10.1007/s11357-024-01301-1. Epub 2024 Aug 1.

Survival Prediction of Patients with Bladder Cancer after Cystectomy Based on Clinical, Radiomics, and Deep-Learning Descriptors.

Cancers (Basel). 2023 Sep 1;15(17):4372. doi: 10.3390/cancers15174372.

Time-Varying Functional Connectivity of Rat Brain during Bipedal Walking on Unexpected Terrain.

Cyborg Bionic Syst. 2023;4:0017. doi: 10.34133/cbsystems.0017. Epub 2023 Mar 29.

The role of artificial intelligence based on PET/CT radiomics in NSCLC: Disease management, opportunities, and challenges.

Front Oncol. 2023 Mar 7;13:1133164. doi: 10.3389/fonc.2023.1133164. eCollection 2023.

Deep Learning Recurrent Neural Network for Concussion Classification in Adolescents Using Raw Electroencephalography Signals: Toward a Minimal Number of Sensors.

Front Hum Neurosci. 2021 Nov 24;15:734501. doi: 10.3389/fnhum.2021.734501. eCollection 2021.

Recurrent neural network-based acute concussion classifier using raw resting state EEG data.

Sci Rep. 2021 Jun 11;11(1):12353. doi: 10.1038/s41598-021-91614-4.

Deep Learning in Medical Image Analysis.

Adv Exp Med Biol. 2020;1213:3-21. doi: 10.1007/978-3-030-33128-3_1.

Evaluation of data augmentation via synthetic images for improved breast mass detection on mammograms using deep learning.

J Med Imaging (Bellingham). 2020 Jan;7(1):012703. doi: 10.1117/1.JMI.7.1.012703. Epub 2019 Nov 22.

Breast Cancer Diagnosis in Digital Breast Tomosynthesis: Effects of Training Sample Size on Multi-Stage Transfer Learning Using Deep Neural Nets.

IEEE Trans Med Imaging. 2019 Mar;38(3):686-696. doi: 10.1109/TMI.2018.2870343.

本文引用的文献

Texture features for classification of ultrasonic liver images.

IEEE Trans Med Imaging. 1992;11(2):141-52. doi: 10.1109/42.141636.

Feature selection in the pattern classification problem of digital chest radiograph segmentation.

IEEE Trans Med Imaging. 1995;14(3):537-47. doi: 10.1109/42.414619.

A decision-making theory of visual detection.

Psychol Rev. 1954 Nov;61(6):401-9. doi: 10.1037/h0058700.

Classification of malignant and benign masses based on hybrid ART2LDA approach.

IEEE Trans Med Imaging. 1999 Dec;18(12):1178-87. doi: 10.1109/42.819327.

Classifier design for computer-aided diagnosis: effects of finite sample size on the mean performance of classical and neural network classifiers.

Med Phys. 1999 Dec;26(12):2654-68. doi: 10.1118/1.598805.

Computerized analysis of mammographic microcalcifications in morphological and texture feature spaces.

Med Phys. 1998 Oct;25(10):2007-19. doi: 10.1118/1.598389.

Computerized analysis of breast lesions in three dimensions using dynamic magnetic-resonance imaging.

Med Phys. 1998 Sep;25(9):1647-54. doi: 10.1118/1.598345.

MR image texture analysis applied to the diagnosis and tracking of Alzheimer's disease.

IEEE Trans Med Imaging. 1998 Jun;17(3):475-9. doi: 10.1109/42.712137.

Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data.

Stat Med. 1998 May 15;17(9):1033-53. doi: 10.1002/(sici)1097-0258(19980515)17:9<1033::aid-sim784>3.0.co;2-z.

Computerized characterization of masses on mammograms: the rubber band straightening transform and texture analysis.

Med Phys. 1998 Apr;25(4):516-26. doi: 10.1118/1.598228.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

计算机辅助诊断中的特征选择与分类器性能：有限样本量的影响。

Feature selection and classifier performance in computer-aided diagnosis: the effect of finite sample size.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献