用于微阵列分类的可靠基因特征：稳定性和性能评估

Reliable gene signatures for microarray classification: assessment of stability and performance.

作者信息

Davis Chad A, Gerick Fabian, Hintermair Volker, Friedel Caroline C, Fundel Katrin, Küffner Robert, Zimmer Ralf

机构信息

Institute of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17 80333 Munich, Germany.

出版信息

Bioinformatics. 2006 Oct 1;22(19):2356-63. doi: 10.1093/bioinformatics/btl400. Epub 2006 Jul 31.

DOI:10.1093/bioinformatics/btl400

PMID:16882647

Abstract

MOTIVATION

Two important questions for the analysis of gene expression measurements from different sample classes are (1) how to classify samples and (2) how to identify meaningful gene signatures (ranked gene lists) exhibiting the differences between classes and sample subsets. Solutions to both questions have immediate biological and biomedical applications. To achieve optimal classification performance, a suitable combination of classifier and gene selection method needs to be specifically selected for a given dataset. The selected gene signatures can be unstable and the resulting classification accuracy unreliable, particularly when considering different subsets of samples. Both unstable gene signatures and overestimated classification accuracy can impair biological conclusions.

METHODS

We address these two issues by repeatedly evaluating the classification performance of all models, i.e. pairwise combinations of various gene selection and classification methods, for random subsets of arrays (sampling). A model score is used to select the most appropriate model for the given dataset. Consensus gene signatures are constructed by extracting those genes frequently selected over many samplings. Sampling additionally permits measurement of the stability of the classification performance for each model, which serves as a measure of model reliability.

RESULTS

We analyzed a large gene expression dataset with 78 measurements of four different cartilage sample classes. Classifiers trained on subsets of measurements frequently produce models with highly variable performance. Our approach provides reliable classification performance estimates via sampling. In addition to reliable classification performance, we determined stable consensus signatures (i.e. gene lists) for sample classes. Manual literature screening showed that these genes are highly relevant to our gene expression experiment with osteoarthritic cartilage. We compared our approach to others based on a publicly available dataset on breast cancer.

AVAILABILITY

R package at http://www.bio.ifi.lmu.de/~davis/edaprakt

摘要

动机

分析来自不同样本类别的基因表达测量数据时，有两个重要问题：（1）如何对样本进行分类；（2）如何识别能够展现类别与样本子集间差异的有意义的基因特征（排名基因列表）。这两个问题的解决方案都有直接的生物学和生物医学应用。为实现最佳分类性能，需要针对给定数据集专门选择分类器和基因选择方法的合适组合。所选的基因特征可能不稳定，由此得出的分类准确性也不可靠，尤其是在考虑样本的不同子集时。不稳定的基因特征和高估的分类准确性都会损害生物学结论。

方法

我们通过反复评估所有模型（即各种基因选择和分类方法的两两组合）对随机阵列子集（抽样）的分类性能来解决这两个问题。使用模型分数为给定数据集选择最合适的模型。通过提取在多次抽样中频繁被选中的基因来构建共识基因特征。抽样还允许测量每个模型分类性能的稳定性，以此作为模型可靠性的一种度量。

结果

我们分析了一个包含四个不同软骨样本类别的78次测量的大型基因表达数据集。在测量子集上训练的分类器经常产生性能差异很大的模型。我们的方法通过抽样提供可靠的分类性能估计。除了可靠的分类性能外，我们还确定了样本类别的稳定共识特征（即基因列表）。人工文献筛选表明，这些基因与我们关于骨关节炎软骨的基因表达实验高度相关。我们基于一个公开的乳腺癌数据集将我们的方法与其他方法进行了比较。

可用性

R包可在http://www.bio.ifi.lmu.de/~davis/edaprakt获取

相似文献

Reliable gene signatures for microarray classification: assessment of stability and performance.

Bioinformatics. 2006 Oct 1;22(19):2356-63. doi: 10.1093/bioinformatics/btl400. Epub 2006 Jul 31.

Mixture classification model based on clinical markers for breast cancer prognosis.

Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.

Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.

Bioinformatics. 2006 Oct 1;22(19):2348-55. doi: 10.1093/bioinformatics/btl386. Epub 2006 Jul 14.

Gene selection via the BAHSIC family of algorithms.

Bioinformatics. 2007 Jul 1;23(13):i490-8. doi: 10.1093/bioinformatics/btm216.

Cancer classification and prediction using logistic regression with Bayesian gene selection.

J Biomed Inform. 2004 Aug;37(4):249-59. doi: 10.1016/j.jbi.2004.07.009.

Ensemble gene selection by grouping for microarray data classification.

J Biomed Inform. 2010 Feb;43(1):81-7. doi: 10.1016/j.jbi.2009.08.010. Epub 2009 Aug 20.

Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.

Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18.

The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms.

Bioinformatics. 2006 Oct 15;22(20):2507-15. doi: 10.1093/bioinformatics/btl438. Epub 2006 Aug 14.

Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data.

Bioinformatics. 2005 Jul 1;21(13):3001-8. doi: 10.1093/bioinformatics/bti422. Epub 2005 Apr 6.

Classification of microarray data with factor mixture models.

Bioinformatics. 2006 Jan 15;22(2):202-8. doi: 10.1093/bioinformatics/bti779. Epub 2005 Nov 15.

引用本文的文献

Stable multivariate lesion symptom mapping.

Apert Neuro. 2024;4. doi: 10.52294/001c.117311. Epub 2024 Jun 7.

Improving the Post-Operative Prediction of BCR-Free Survival Time with mRNA Variables and Machine Learning.

Cancers (Basel). 2023 Feb 17;15(4):1276. doi: 10.3390/cancers15041276.

Efficient cross-validation traversals in feature subset selection.

Sci Rep. 2022 Dec 12;12(1):21485. doi: 10.1038/s41598-022-25942-4.

Feature Selection Stability and Accuracy of Prediction Models for Genomic Prediction of Residual Feed Intake in Pigs Using Machine Learning.

Front Genet. 2021 Feb 22;12:611506. doi: 10.3389/fgene.2021.611506. eCollection 2021.

A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.

Comput Math Methods Med. 2017;2017:7907163. doi: 10.1155/2017/7907163. Epub 2017 Aug 1.

Consistent metagenomic biomarker detection via robust PCA.

Biol Direct. 2017 Jan 31;12(1):4. doi: 10.1186/s13062-017-0175-4.

A novel feature extraction approach for microarray data based on multi-algorithm fusion.

Bioinformation. 2015 Jan 30;11(1):27-33. doi: 10.6026/97320630011027. eCollection 2015.

Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.

PLoS One. 2014 Oct 8;9(10):e108818. doi: 10.1371/journal.pone.0108818. eCollection 2014.

A generic cycling hypoxia-derived prognostic gene signature: application to breast cancer profiling.

Oncotarget. 2014 Aug 30;5(16):6947-63. doi: 10.18632/oncotarget.2285.

Stability of bivariate GWAS biomarker detection.

PLoS One. 2014 Apr 30;9(4):e93319. doi: 10.1371/journal.pone.0093319. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于微阵列分类的可靠基因特征：稳定性和性能评估

Reliable gene signatures for microarray classification: assessment of stability and performance.

作者信息

机构信息

出版信息

MOTIVATION

METHODS

RESULTS

AVAILABILITY

动机

方法

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献