一种应用于癌症基因表达谱的分类框架。

A classification framework applied to cancer gene expression profiles.

机构信息

Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.

出版信息

J Healthc Eng. 2013;4(2):255-83. doi: 10.1260/2040-2295.4.2.255.

DOI:10.1260/2040-2295.4.2.255

PMID:23778014

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3873740/

Abstract

Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF) and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM), bagging, and random forest) on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression) increase the prediction accuracy as compared to using gene expression alone.

摘要

基于基因表达的癌症分类为可能的治疗策略提供了深入的了解。因此，开发能够成功区分癌症亚型或正常与癌症样本的机器学习方法非常重要。本工作讨论了用于癌症分类的监督学习技术。此外，还采用了一种基于属性估计方法（例如 ReliefF）和遗传算法的两步特征选择方法，以找到一组可以最佳区分癌症亚型或正常与癌症样本的基因。不同分类方法（例如决策树、k-最近邻、支持向量机 (SVM)、袋装和随机森林）在 5 个癌症数据集上的应用表明，没有一种分类方法普遍优于所有其他方法。然而，k-最近邻和线性 SVM 通常优于其他分类器，从而提高了分类性能。最后，与仅使用基因表达相比，结合多种类型的基因组数据（例如蛋白质-蛋白质相互作用数据和基因表达数据）可提高预测准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f42/3873740/5bba4ff2bb03/nihms529771f1.jpg

相似文献

A classification framework applied to cancer gene expression profiles.一种应用于癌症基因表达谱的分类框架。

J Healthc Eng. 2013;4(2):255-83. doi: 10.1260/2040-2295.4.2.255.

A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis.用于微阵列基因表达癌症诊断的多类别分类方法的综合评估。

Bioinformatics. 2005 Mar 1;21(5):631-43. doi: 10.1093/bioinformatics/bti033. Epub 2004 Sep 16.

Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules.基于基因表达谱和功能模块，替换不可靠的cDNA微阵列测量值对疾病分类的影响。

Bioinformatics. 2006 Dec 1;22(23):2883-9. doi: 10.1093/bioinformatics/btl339. Epub 2006 Jun 29.

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.使用微阵列基因表达数据的用于疾病分类的核嵌入高斯过程。

BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67.

Bagging linear sparse Bayesian learning models for variable selection in cancer diagnosis.用于癌症诊断中变量选择的袋装线性稀疏贝叶斯学习模型

IEEE Trans Inf Technol Biomed. 2007 May;11(3):338-47. doi: 10.1109/titb.2006.889702.

Simple decision rules for classifying human cancers from gene expression profiles.基于基因表达谱对人类癌症进行分类的简单决策规则。

Bioinformatics. 2005 Oct 15;21(20):3896-904. doi: 10.1093/bioinformatics/bti631. Epub 2005 Aug 16.

Cancer molecular pattern discovery by subspace consensus kernel classification.基于子空间共识核分类的癌症分子模式发现

Comput Syst Bioinformatics Conf. 2007;6:55-65.

Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering.利用神经网络和关系模糊聚类从基因表达数据中发现用于预测癌症亚组的生物标志物。

BMC Bioinformatics. 2007 Jan 6;8:5. doi: 10.1186/1471-2105-8-5.

Dimension reduction-based penalized logistic regression for cancer classification using microarray data.基于降维的惩罚逻辑回归用于利用微阵列数据进行癌症分类

IEEE/ACM Trans Comput Biol Bioinform. 2005 Apr-Jun;2(2):166-75. doi: 10.1109/TCBB.2005.22.

Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles.使用决策树集成对癌前胰腺癌质谱数据进行分类。

BMC Bioinformatics. 2008 Jun 11;9:275. doi: 10.1186/1471-2105-9-275.

引用本文的文献

Breast cancer prediction based on gene expression data using interpretable machine learning techniques.基于基因表达数据，运用可解释机器学习技术进行乳腺癌预测。

Sci Rep. 2025 Mar 4;15(1):7594. doi: 10.1038/s41598-025-85323-5.

Gradient boosting reveals spatially diverse cholesterol gene signatures in colon cancer.梯度提升揭示了结肠癌中空间上不同的胆固醇基因特征。

Front Genet. 2024 Nov 29;15:1410353. doi: 10.3389/fgene.2024.1410353. eCollection 2024.

The roles of genetic mutation and cytokines/chemokines in immune response and their association with uveal melanoma patient outcome.基因突变和细胞因子/趋化因子在免疫反应中的作用及其与葡萄膜黑色素瘤患者预后的关联。

Heliyon. 2024 Sep 11;10(18):e37852. doi: 10.1016/j.heliyon.2024.e37852. eCollection 2024 Sep 30.

Integrative analysis of RNA expression data unveils distinct cancer types through machine learning techniques.通过机器学习技术对RNA表达数据进行综合分析可揭示不同的癌症类型。

Saudi J Biol Sci. 2024 Mar;31(3):103918. doi: 10.1016/j.sjbs.2023.103918. Epub 2023 Dec 30.

Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review.使用基因表达数据进行癌症分类的机器学习方法：综述

Bioengineering (Basel). 2023 Jan 28;10(2):173. doi: 10.3390/bioengineering10020173.

RNA-Seq-Based Breast Cancer Subtypes Classification Using Machine Learning Approaches.基于RNA测序的乳腺癌亚型机器学习分类方法

Comput Intell Neurosci. 2020 Oct 29;2020:4737969. doi: 10.1155/2020/4737969. eCollection 2020.

An Amalgamated Approach to Bilevel Feature Selection Techniques Utilizing Soft Computing Methods for Classifying Colon Cancer.利用软计算方法对结肠癌进行分类的双层特征选择技术的联合方法

Biomed Res Int. 2020 Oct 13;2020:8427574. doi: 10.1155/2020/8427574. eCollection 2020.

Wnt/-Catenin, Carbohydrate Metabolism, and PI3K-Akt Signaling Pathway-Related Genes as Potential Cancer Predictors.Wnt/-Catenin、碳水化合物代谢和 PI3K-Akt 信号通路相关基因作为潜在的癌症预测因子。

J Healthc Eng. 2019 Oct 20;2019:9724589. doi: 10.1155/2019/9724589. eCollection 2019.

Obesity, diabetes and the risk of colorectal adenoma and cancer.肥胖、糖尿病与结直肠腺瘤及癌症风险

BMC Endocr Disord. 2019 Oct 29;19(1):113. doi: 10.1186/s12902-019-0444-6.

A novel method for dipper/non-dipper pattern classification in hypertensive and non-diabetic patients.一种用于高血压非糖尿病患者勺型/非勺型模式分类的新方法。

Technol Health Care. 2019;27(S1):47-57. doi: 10.3233/THC-199006.

本文引用的文献

Random forests for genomic data analysis.随机森林在基因组数据分析中的应用。

Genomics. 2012 Jun;99(6):323-9. doi: 10.1016/j.ygeno.2012.04.003. Epub 2012 Apr 21.

Gene selection and classification for cancer microarray data based on machine learning and similarity measures.基于机器学习和相似性度量的癌症基因芯片数据选择与分类。

BMC Genomics. 2011 Dec 23;12 Suppl 5(Suppl 5):S1. doi: 10.1186/1471-2164-12-S5-S1.

SVM-T-RFE: a novel gene selection algorithm for identifying metastasis-related genes in colorectal cancer using gene expression profiles.SVM-T-RFE：一种基于基因表达谱识别结直肠癌转移相关基因的新型基因选择算法。

Biochem Biophys Res Commun. 2012 Mar 9;419(2):148-53. doi: 10.1016/j.bbrc.2012.01.087. Epub 2012 Jan 28.

Identifying cancer biomarkers by network-constrained support vector machines.通过网络约束支持向量机识别癌症生物标志物。

BMC Syst Biol. 2011 Oct 12;5:161. doi: 10.1186/1752-0509-5-161.

A comparison of machine learning techniques for survival prediction in breast cancer.机器学习技术在乳腺癌生存预测中的比较。

BioData Min. 2011 May 11;4:12. doi: 10.1186/1756-0381-4-12.

Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.基于集成特征选择方法的癌症诊断稳健生物标志物识别。

Bioinformatics. 2010 Feb 1;26(3):392-8. doi: 10.1093/bioinformatics/btp630. Epub 2009 Nov 25.

CLU and colon cancer. The dual face of CLU: from normal to malignant phenotype.CLU 与结肠癌。CLU 的双重面目：从正常到恶性表型。

Adv Cancer Res. 2009;105:45-61. doi: 10.1016/S0065-230X(09)05003-9.

Classification and biomarker identification using gene network modules and support vector machines.基于基因网络模块和支持向量机的分类和生物标志物识别。

BMC Bioinformatics. 2009 Oct 15;10:337. doi: 10.1186/1471-2105-10-337.

Applications of machine learning in cancer prediction and prognosis.机器学习在癌症预测和预后中的应用。

Cancer Inform. 2007 Feb 11;2:59-77.

A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。

Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验