• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于单变量和遗传算法的微阵列数据二元分类特征选择的实证研究。

An empirical study of univariate and genetic algorithm-based feature selection in binary classification with microarray data.

作者信息

Lecocke Michael, Hess Kenneth

机构信息

Department of Mathematics, St. Mary's University, San Antonio, Texas 78228, USA.

出版信息

Cancer Inform. 2007 Feb 23;2:313-27.

PMID:19458774
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2675488/
Abstract

BACKGROUND

We consider both univariate- and multivariate-based feature selection for the problem of binary classification with microarray data. The idea is to determine whether the more sophisticated multivariate approach leads to better misclassification error rates because of the potential to consider jointly significant subsets of genes (but without overfitting the data).

METHODS

We present an empirical study in which 10-fold cross-validation is applied externally to both a univariate-based and two multivariate- (genetic algorithm (GA)-) based feature selection processes. These procedures are applied with respect to three supervised learning algorithms and six published two-class microarray datasets.

RESULTS

Considering all datasets, and learning algorithms, the average 10-fold external cross-validation error rates for the univariate-, single-stage GA-, and two-stage GA-based processes are 14.2%, 14.6%, and 14.2%, respectively. We also find that the optimism bias estimates from the GA analyses were half that of the univariate approach, but the selection bias estimates from the GA analyses were 2.5 times that of the univariate results.

CONCLUSIONS

We find that the 10-fold external cross-validation misclassification error rates were very comparable. Further, we find that a two-stage GA approach did not demonstrate a significant advantage over a 1-stage approach. We also find that the univariate approach had higher optimism bias and lower selection bias compared to both GA approaches.

摘要

背景

针对微阵列数据的二元分类问题,我们考虑基于单变量和多变量的特征选择。其理念是确定更复杂的多变量方法是否由于能够联合考虑具有显著意义的基因子集(但不过度拟合数据)而能带来更低的误分类错误率。

方法

我们开展了一项实证研究,其中将10折交叉验证外部应用于基于单变量的特征选择过程以及两种基于多变量(遗传算法(GA))的特征选择过程。这些过程针对三种监督学习算法和六个已发表的两类微阵列数据集进行应用。

结果

考虑所有数据集和学习算法,基于单变量、单阶段GA和两阶段GA的过程的平均10折外部交叉验证错误率分别为14.2%、14.6%和14.2%。我们还发现,GA分析得出的乐观偏差估计值是单变量方法的一半,但GA分析得出的选择偏差估计值是单变量结果的2.5倍。

结论

我们发现10折外部交叉验证误分类错误率非常相近。此外,我们发现两阶段GA方法并未显示出比单阶段方法有显著优势。我们还发现,与两种GA方法相比,单变量方法具有更高的乐观偏差和更低的选择偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7445/2675488/3378dedf57f2/CIN-02-313-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7445/2675488/dcdcec65e751/CIN-02-313-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7445/2675488/85ddab316156/CIN-02-313-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7445/2675488/c57b70eb4a28/CIN-02-313-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7445/2675488/3d13f4eb8ccc/CIN-02-313-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7445/2675488/3378dedf57f2/CIN-02-313-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7445/2675488/dcdcec65e751/CIN-02-313-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7445/2675488/85ddab316156/CIN-02-313-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7445/2675488/c57b70eb4a28/CIN-02-313-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7445/2675488/3d13f4eb8ccc/CIN-02-313-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7445/2675488/3378dedf57f2/CIN-02-313-g005.jpg

相似文献

1
An empirical study of univariate and genetic algorithm-based feature selection in binary classification with microarray data.基于单变量和遗传算法的微阵列数据二元分类特征选择的实证研究。
Cancer Inform. 2007 Feb 23;2:313-27.
2
A comparative study of different machine learning methods on microarray gene expression data.不同机器学习方法对微阵列基因表达数据的比较研究。
BMC Genomics. 2008;9 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2164-9-S1-S13.
3
The feature selection bias problem in relation to high-dimensional gene data.与高维基因数据相关的特征选择偏差问题。
Artif Intell Med. 2016 Jan;66:63-71. doi: 10.1016/j.artmed.2015.11.001. Epub 2015 Nov 14.
4
Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data.基于遗传算法的特征选择与流形学习在基于微阵列数据的癌症分类中的应用。
BMC Bioinformatics. 2023 Apr 8;24(1):139. doi: 10.1186/s12859-023-05267-3.
5
Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.遗传蜂群(GBC)算法:一种用于微阵列癌症分类的新基因选择方法。
Comput Biol Chem. 2015 Jun;56:49-60. doi: 10.1016/j.compbiolchem.2015.03.001. Epub 2015 Mar 18.
6
GeFeS: A generalized wrapper feature selection approach for optimizing classification performance.GeFeS:一种用于优化分类性能的广义包装特征选择方法。
Comput Biol Med. 2020 Oct;125:103974. doi: 10.1016/j.compbiomed.2020.103974. Epub 2020 Aug 20.
7
A universal deep learning approach for modeling the flow of patients under different severities.一种通用的深度学习方法,用于对不同严重程度的患者进行建模。
Comput Methods Programs Biomed. 2018 Feb;154:191-203. doi: 10.1016/j.cmpb.2017.11.003. Epub 2017 Nov 7.
8
Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction.使用两步特征选择和分类器集成构建方法进行肺结节计算机辅助诊断。
Artif Intell Med. 2010 Sep;50(1):43-53. doi: 10.1016/j.artmed.2010.04.011. Epub 2010 May 31.
9
Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR.用于定量构效关系中特征选择的序列与混合遗传算法及学习自动机(SGALA、MGALA)
Iran J Pharm Res. 2017 Spring;16(2):533-553.
10
Classification based upon gene expression data: bias and precision of error rates.基于基因表达数据的分类:错误率的偏差与精度
Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28.

引用本文的文献

1
Developing and testing high-efficacy patient subgroups within a clinical trial using risk scores.在临床试验中使用风险评分来开发和测试高效能患者亚组。
Stat Med. 2020 Oct 30;39(24):3285-3298. doi: 10.1002/sim.8665. Epub 2020 Jul 14.
2
Feature Selection and Cancer Classification via Sparse Logistic Regression with the Hybrid L1/2 +2 Regularization.基于混合L1/2 +2正则化的稀疏逻辑回归的特征选择与癌症分类
PLoS One. 2016 May 2;11(5):e0149675. doi: 10.1371/journal.pone.0149675. eCollection 2016.
3
Identification of Marker Genes for Cancer Based on Microarrays Using a Computational Biology Approach.

本文引用的文献

1
Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer.整合微阵列数据、稳健特征选择及预测前列腺癌预后
Cancer Inform. 2007 Feb 14;2:87-97.
2
Prediction error estimation: a comparison of resampling methods.预测误差估计:重采样方法的比较
Bioinformatics. 2005 Aug 1;21(15):3301-7. doi: 10.1093/bioinformatics/bti499. Epub 2005 May 19.
3
Multiclass cancer classification and biomarker discovery using GA-based algorithms.使用基于遗传算法的算法进行多类别癌症分类和生物标志物发现。
基于计算生物学方法利用微阵列鉴定癌症标记基因。
Curr Bioinform. 2014 Apr 1;9(2):140-146. doi: 10.2174/1574893608999140109115649.
4
Boosting the concordance index for survival data--a unified framework to derive and evaluate biomarker combinations.提高生存数据的一致性指数——一种用于推导和评估生物标志物组合的统一框架。
PLoS One. 2014 Jan 6;9(1):e84483. doi: 10.1371/journal.pone.0084483. eCollection 2014.
5
Genomic biomarkers for personalized medicine: development and validation in clinical studies.基因组生物标志物在个性化医学中的应用:在临床研究中的开发和验证。
Comput Math Methods Med. 2013;2013:865980. doi: 10.1155/2013/865980. Epub 2013 Apr 17.
6
Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data.基于规则的机器学习在候选疾病基因优先级和癌症基因表达数据样本分类中的应用。
PLoS One. 2012;7(7):e39932. doi: 10.1371/journal.pone.0039932. Epub 2012 Jul 11.
7
Microarray-based cancer prediction using single genes.基于微阵列的单基因癌症预测。
BMC Bioinformatics. 2011 Oct 7;12:391. doi: 10.1186/1471-2105-12-391.
8
Statistical contributions to proteomic research.蛋白质组学研究的统计学贡献。
Methods Mol Biol. 2010;641:143-66. doi: 10.1007/978-1-60761-711-2_9.
9
Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.训练样本量和分类难度对基因组预测器准确性的影响。
Breast Cancer Res. 2010;12(1):R5. doi: 10.1186/bcr2468. Epub 2010 Jan 11.
10
Analysis of DNA microarray expression data.DNA微阵列表达数据的分析。
Best Pract Res Clin Haematol. 2009 Jun;22(2):271-82. doi: 10.1016/j.beha.2009.07.001.
Bioinformatics. 2005 Jun 1;21(11):2691-7. doi: 10.1093/bioinformatics/bti419. Epub 2005 Apr 6.
4
Cancer characterization and feature set extraction by discriminative margin clustering.通过判别性边界聚类进行癌症特征描述与特征集提取
BMC Bioinformatics. 2004 Mar 3;5:21. doi: 10.1186/1471-2105-5-21.
5
Is cross-validation valid for small-sample microarray classification?交叉验证对小样本微阵列分类是否有效?
Bioinformatics. 2004 Feb 12;20(3):374-80. doi: 10.1093/bioinformatics/btg419.
6
Gene expression profiling identifies clinically relevant subtypes of prostate cancer.基因表达谱分析可识别前列腺癌的临床相关亚型。
Proc Natl Acad Sci U S A. 2004 Jan 20;101(3):811-6. doi: 10.1073/pnas.0304146101. Epub 2004 Jan 7.
7
Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data.利用高维微阵列数据中的基因表达谱进行诊断和预后预测。
Br J Cancer. 2003 Nov 3;89(9):1599-604. doi: 10.1038/sj.bjc.6601326.
8
A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples.一种用于分析血清样本基质辅助激光解吸/电离飞行时间蛋白质组学光谱的综合方法。
Proteomics. 2003 Sep;3(9):1667-72. doi: 10.1002/pmic.200300522.
9
Gene expression-based classification of malignant gliomas correlates better with survival than histological classification.基于基因表达的恶性胶质瘤分类与生存的相关性比组织学分类更好。
Cancer Res. 2003 Apr 1;63(7):1602-7.
10
Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification.使用DNA微阵列数据进行诊断和预后分类时的陷阱。
J Natl Cancer Inst. 2003 Jan 1;95(1):14-8. doi: 10.1093/jnci/95.1.14.