用于基于基因表达数据的肿瘤分类的BagBoosting算法

BagBoosting for tumor classification with gene expression data.

作者信息

Dettling Marcel

机构信息

Seminar für Statistik, ETH Zürich, CH-8092 Switzerland.

出版信息

Bioinformatics. 2004 Dec 12;20(18):3583-93. doi: 10.1093/bioinformatics/bth447. Epub 2004 Oct 5.

DOI:10.1093/bioinformatics/bth447

PMID:15466910

Abstract

MOTIVATION

Microarray experiments are expected to contribute significantly to the progress in cancer treatment by enabling a precise and early diagnosis. They create a need for class prediction tools, which can deal with a large number of highly correlated input variables, perform feature selection and provide class probability estimates that serve as a quantification of the predictive uncertainty. A very promising solution is to combine the two ensemble schemes bagging and boosting to a novel algorithm called BagBoosting.

RESULTS

When bagging is used as a module in boosting, the resulting classifier consistently improves the predictive performance and the probability estimates of both bagging and boosting on real and simulated gene expression data. This quasi-guaranteed improvement can be obtained by simply making a bigger computing effort. The advantageous predictive potential is also confirmed by comparing BagBoosting to several established class prediction tools for microarray data.

AVAILABILITY

Software for the modified boosting algorithms, for benchmark studies and for the simulation of microarray data are available as an R package under GNU public license at http://stat.ethz.ch/~dettling/bagboost.html.

摘要

动机

微阵列实验有望通过实现精确和早期诊断，为癌症治疗的进展做出重大贡献。它们催生了对类别预测工具的需求，这类工具能够处理大量高度相关的输入变量，进行特征选择，并提供类别概率估计，以此作为预测不确定性的量化指标。一个非常有前景的解决方案是将两种集成方法装袋法（bagging）和提升法（boosting）结合成一种名为BagBoosting的新算法。

结果

当把装袋法用作提升法中的一个模块时，所得分类器在真实和模拟基因表达数据上持续提高了装袋法和提升法的预测性能及概率估计。只需加大计算量就能实现这种几乎有保证的改进。通过将BagBoosting与几种用于微阵列数据的既定类别预测工具进行比较，也证实了其有利的预测潜力。

可用性

用于修改后的提升算法、基准研究以及微阵列数据模拟的软件，以R包的形式在GNU公共许可下可从http://stat.ethz.ch/~dettling/bagboost.html获取。

相似文献

BagBoosting for tumor classification with gene expression data.用于基于基因表达数据的肿瘤分类的BagBoosting算法

Bioinformatics. 2004 Dec 12;20(18):3583-93. doi: 10.1093/bioinformatics/bth447. Epub 2004 Oct 5.

A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis.用于微阵列基因表达癌症诊断的多类别分类方法的综合评估。

Bioinformatics. 2005 Mar 1;21(5):631-43. doi: 10.1093/bioinformatics/bti033. Epub 2004 Sep 16.

Boosting for tumor classification with gene expression data.利用基因表达数据进行肿瘤分类的提升算法

Bioinformatics. 2003 Jun 12;19(9):1061-9. doi: 10.1093/bioinformatics/btf867.

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.使用微阵列基因表达数据的用于疾病分类的核嵌入高斯过程。

BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67.

Eigengene-based linear discriminant model for tumor classification using gene expression microarray data.基于特征基因的线性判别模型用于利用基因表达微阵列数据进行肿瘤分类

Bioinformatics. 2006 Nov 1;22(21):2635-42. doi: 10.1093/bioinformatics/btl442. Epub 2006 Aug 22.

A simple and efficient algorithm for gene selection using sparse logistic regression.一种使用稀疏逻辑回归进行基因选择的简单高效算法。

Bioinformatics. 2003 Nov 22;19(17):2246-53. doi: 10.1093/bioinformatics/btg308.

An efficient semi-unsupervised gene selection method via spectral biclustering.一种基于谱双聚类的高效半监督基因选择方法。

IEEE Trans Nanobioscience. 2006 Jun;5(2):110-4. doi: 10.1109/tnb.2006.875040.

Ensemble dependence model for classification and prediction of cancer and normal gene expression data.用于癌症和正常基因表达数据分类与预测的集成依赖模型。

Bioinformatics. 2005 Jul 15;21(14):3114-21. doi: 10.1093/bioinformatics/bti483. Epub 2005 May 6.

Multiclass cancer classification and biomarker discovery using GA-based algorithms.使用基于遗传算法的算法进行多类别癌症分类和生物标志物发现。

Bioinformatics. 2005 Jun 1;21(11):2691-7. doi: 10.1093/bioinformatics/bti419. Epub 2005 Apr 6.

Bagging linear sparse Bayesian learning models for variable selection in cancer diagnosis.用于癌症诊断中变量选择的袋装线性稀疏贝叶斯学习模型

IEEE Trans Inf Technol Biomed. 2007 May;11(3):338-47. doi: 10.1109/titb.2006.889702.

引用本文的文献

Sparse vertex discriminant analysis: Variable selection for biomedical classification applications.稀疏顶点判别分析：生物医学分类应用中的变量选择

Comput Stat Data Anal. 2025 Jun;206. doi: 10.1016/j.csda.2025.108125. Epub 2025 Jan 7.

Subject clustering by IF-PCA and several recent methods.通过IF-PCA和几种近期方法进行主题聚类。

Front Genet. 2023 May 23;14:1166404. doi: 10.3389/fgene.2023.1166404. eCollection 2023.

The ability to classify patients based on gene-expression data varies by algorithm and performance metric.基于基因表达数据对患者进行分类的能力因算法和性能指标而异。

PLoS Comput Biol. 2022 Mar 11;18(3):e1009926. doi: 10.1371/journal.pcbi.1009926. eCollection 2022 Mar.

Sci Rep. 2021 Nov 3;11(1):21609. doi: 10.1038/s41598-021-00678-9.

selectBoost: a general algorithm to enhance the performance of variable selection methods.选择提升：一种增强变量选择方法性能的通用算法。

Bioinformatics. 2021 May 5;37(5):659-668. doi: 10.1093/bioinformatics/btaa855.

The Univariate Flagging Algorithm (UFA): An interpretable approach for predictive modeling.单变量标记算法（UFA）：一种用于预测建模的可解释方法。

PLoS One. 2019 Oct 11;14(10):e0223161. doi: 10.1371/journal.pone.0223161. eCollection 2019.

Multi-Class Neural Networks to Predict Lung Cancer.多类神经网络预测肺癌。

J Med Syst. 2019 May 31;43(7):211. doi: 10.1007/s10916-019-1355-9.

Multiple Human-Behaviour Indicators for Predicting Lung Cancer Mortality with Support Vector Machine.基于支持向量机的多个人类行为指标预测肺癌死亡率。

Sci Rep. 2018 Nov 9;8(1):16596. doi: 10.1038/s41598-018-34945-z.

Random Effects Model for Multiple Pathway Analysis with Applications to Type II Diabetes Microarray Data.用于多通路分析的随机效应模型及其在II型糖尿病微阵列数据中的应用

Stat Biosci. 2015 Oct 1;7(2):167-186. doi: 10.1007/s12561-014-9109-1. Epub 2014 Jan 30.

Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations.使用机器学习和体细胞突变对癌症原发部位进行分类

Biomed Res Int. 2015;2015:491502. doi: 10.1155/2015/491502. Epub 2015 Oct 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于基于基因表达数据的肿瘤分类的BagBoosting算法

BagBoosting for tumor classification with gene expression data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献