Suppr超能文献

癌症研究中基于机器学习的DNA微阵列清晰和模糊分类的接收器操作特征(ROC)曲线。

Machine learning-based receiver operating characteristic (ROC) curves for crisp and fuzzy classification of DNA microarrays in cancer research.

作者信息

Peterson Leif E, Coleman Matthew A

机构信息

Baylor College of Medicine, Houston, Texas 77030 USA.

出版信息

Int J Approx Reason. 2008 Jan;47(1):17-36. doi: 10.1016/j.ijar.2007.03.006.

Abstract

Receiver operating characteristic (ROC) curves were generated to obtain classification area under the curve (AUC) as a function of feature standardization, fuzzification, and sample size from nine large sets of cancer-related DNA microarrays. Classifiers used included k nearest neighbor (kNN), näive Bayes classifier (NBC), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), learning vector quantization (LVQ1), logistic regression (LOG), polytomous logistic regression (PLOG), artificial neural networks (ANN), particle swarm optimization (PSO), constricted particle swarm optimization (CPSO), kernel regression (RBF), radial basis function networks (RBFN), gradient descent support vector machines (SVMGD), and least squares support vector machines (SVMLS). For each data set, AUC was determined for a number of combinations of sample size, total sum[-log(p)] of feature t-tests, with and without feature standardization and with (fuzzy) and without (crisp) fuzzification of features. Altogether, a total of 2,123,530 classification runs were made. At the greatest level of sample size, ANN resulted in a fitted AUC of 90%, while PSO resulted in the lowest fitted AUC of 72.1%. AUC values derived from 4NN were the most dependent on sample size, while PSO was the least. ANN depended the most on total statistical significance of features used based on sum[-log(p)], whereas PSO was the least dependent. Standardization of features increased AUC by 8.1% for PSO and -0.2% for QDA, while fuzzification increased AUC by 9.4% for PSO and reduced AUC by 3.8% for QDA. AUC determination in planned microarray experiments without standardization and fuzzification of features will benefit the most if CPSO is used for lower levels of feature significance (i.e., sum[-log(p)] ~ 50) and ANN is used for greater levels of significance (i.e., sum[-log(p)] ~ 500). When only standardization of features is performed, studies are likely to benefit most by using CPSO for low levels of feature statistical significance and LVQ1 for greater levels of significance. Studies involving only fuzzification of features should employ LVQ1 because of the substantial gain in AUC observed and low expense of LVQ1. Lastly, PSO resulted in significantly greater levels of AUC (89.5% average) when feature standardization and fuzzification were performed. In consideration of the data sets used and factors influencing AUC which were investigated, if low-expense computation is desired then LVQ1 is recommended. However, if computational expense is of less concern, then PSO or CPSO is recommended.

摘要

生成了受试者工作特征(ROC)曲线,以获取曲线下分类面积(AUC),该面积是九个大型癌症相关DNA微阵列的特征标准化、模糊化和样本量的函数。所使用的分类器包括k近邻(kNN)、朴素贝叶斯分类器(NBC)、线性判别分析(LDA)、二次判别分析(QDA)、学习向量量化(LVQ1)、逻辑回归(LOG)、多分类逻辑回归(PLOG)、人工神经网络(ANN)、粒子群优化(PSO)、收缩粒子群优化(CPSO)、核回归(RBF)、径向基函数网络(RBFN)、梯度下降支持向量机(SVMGD)和最小二乘支持向量机(SVMLS)。对于每个数据集,针对样本量、特征t检验的总和[-log(p)]的多种组合,在有和没有特征标准化以及有(模糊)和没有(清晰)特征模糊化的情况下确定AUC。总共进行了2,123,530次分类运行。在最大样本量水平下,ANN得出的拟合AUC为90%,而PSO得出的拟合AUC最低,为72.1%。源自4NN的AUC值对样本量的依赖性最大,而PSO的依赖性最小。ANN对基于总和[-log(p)]使用的特征的总统计显著性依赖性最大,而PSO的依赖性最小。特征标准化使PSO的AUC提高了8.1%,使QDA的AUC降低了0.2%,而模糊化使PSO的AUC提高了9.4%,使QDA的AUC降低了3.8%。如果在没有特征标准化和模糊化的计划微阵列实验中确定AUC,那么对于较低水平的特征显著性(即总和[-log(p)]50)使用CPSO,对于较高水平的显著性(即总和[-log(p)]500)使用ANN将最有益。当仅进行特征标准化时,对于低水平的特征统计显著性使用CPSO,对于较高水平的显著性使用LVQ1,研究可能会受益最大。仅涉及特征模糊化的研究应采用LVQ1,因为观察到AUC有显著提高且LVQ1成本较低。最后,当进行特征标准化和模糊化时,PSO得出的AUC水平显著更高(平均89.5%)。考虑到所使用的数据集以及所研究的影响AUC的因素,如果希望进行低成本计算,则推荐使用LVQ1。然而,如果对计算成本不太关注,则推荐使用PSO或CPSO。

相似文献

2
Seminal quality prediction using data mining methods.
Technol Health Care. 2014;22(4):531-45. doi: 10.3233/THC-140816.
3
A comprehensive study of brain tumour discrimination using phase combinations, feature rankings, and hybridised classifiers.
Med Biol Eng Comput. 2020 Dec;58(12):2971-2987. doi: 10.1007/s11517-020-02273-y. Epub 2020 Oct 2.
4
A comparative analysis of feature selection models for spatial analysis of floods using hybrid metaheuristic and machine learning models.
Environ Sci Pollut Res Int. 2024 May;31(23):33495-33514. doi: 10.1007/s11356-024-33389-5. Epub 2024 Apr 29.
6
Hybrid Feature-Learning-Based PSO-PCA Feature Engineering Approach for Blood Cancer Classification.
Diagnostics (Basel). 2023 Aug 14;13(16):2672. doi: 10.3390/diagnostics13162672.
8
Identification of a feature selection based pattern recognition scheme for finger movement recognition from multichannel EMG signals.
Australas Phys Eng Sci Med. 2018 Jun;41(2):549-559. doi: 10.1007/s13246-018-0646-7. Epub 2018 May 9.
9
Classification of electrocardiogram signals with support vector machines and particle swarm optimization.
IEEE Trans Inf Technol Biomed. 2008 Sep;12(5):667-77. doi: 10.1109/TITB.2008.923147.

引用本文的文献

2
Identifying Candidate Gene-Disease Associations via Graph Neural Networks.
Entropy (Basel). 2023 Jun 7;25(6):909. doi: 10.3390/e25060909.
4
A comprehensive survey on computational learning methods for analysis of gene expression data.
Front Mol Biosci. 2022 Nov 7;9:907150. doi: 10.3389/fmolb.2022.907150. eCollection 2022.
5
Predictions from algorithmic modeling result in better decisions than from data modeling for soybean iron deficiency chlorosis.
PLoS One. 2021 Jul 9;16(7):e0240948. doi: 10.1371/journal.pone.0240948. eCollection 2021.
6
3-Dimensional facial expression recognition in human using multi-points warping.
BMC Bioinformatics. 2019 Dec 2;20(1):619. doi: 10.1186/s12859-019-3153-2.
9
QCT of the proximal femur--which parameters should be measured to discriminate hip fracture?
Osteoporos Int. 2016 Mar;27(3):1137-1147. doi: 10.1007/s00198-015-3324-6. Epub 2015 Sep 28.

本文引用的文献

1
2
A simple method for assessing sample sizes in microarray experiments.
BMC Bioinformatics. 2006 Mar 2;7:106. doi: 10.1186/1471-2105-7-106.
3
The PowerAtlas: a power and sample size atlas for microarray experimental design and research.
BMC Bioinformatics. 2006 Feb 22;7:84. doi: 10.1186/1471-2105-7-84.
4
An interactive power analysis tool for microarray hypothesis testing and generation.
Bioinformatics. 2006 Apr 1;22(7):808-14. doi: 10.1093/bioinformatics/btk052. Epub 2006 Jan 17.
5
Split-plot microarray experiments: issues of design, power and sample size.
Appl Bioinformatics. 2005;4(3):187-94. doi: 10.2165/00822942-200504030-00003.
6
FDR-controlling testing procedures and sample size determination for microarrays.
Stat Med. 2005 Aug 15;24(15):2267-80. doi: 10.1002/sim.2119.
7
Sample size calculation for multiple testing in microarray data analysis.
Biostatistics. 2005 Jan;6(1):157-69. doi: 10.1093/biostatistics/kxh026.
8
Sample size for identifying differentially expressed genes in microarray experiments.
J Comput Biol. 2004;11(4):714-26. doi: 10.1089/cmb.2004.11.714.
9
Sample size for gene expression microarray experiments.
Bioinformatics. 2005 Apr 15;21(8):1502-8. doi: 10.1093/bioinformatics/bti162. Epub 2004 Nov 25.
10
Sample size for detecting differentially expressed genes in microarray experiments.
BMC Genomics. 2004 Nov 8;5:87. doi: 10.1186/1471-2164-5-87.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验