利用集成特征选择方法和转录组数据的投票分类器进行癌症分类。

Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data.

机构信息

Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka 1207, Bangladesh.

Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh.

出版信息

Genes (Basel). 2023 Sep 14;14(9):1802. doi: 10.3390/genes14091802.

DOI:10.3390/genes14091802

PMID:37761941

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10530870/

Abstract

Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods.

摘要

基于生物标志物的癌症识别和分类工具在生物信息学和机器学习领域得到了广泛应用。然而，微阵列基因表达数据的高维性给癌症诊断中识别重要基因带来了挑战。许多特征选择算法通过选择最优特征来优化癌症诊断。本文提出了一种基于集成排序的特征选择方法 (EFSM) 和一种集成加权平均投票分类器 (VT) 来克服这一挑战。EFSM 使用一种排序方法，该方法从单个选择方法中聚合特征，以有效地发现最相关和最有用的特征。VT 结合支持向量机、k-最近邻和决策树算法来创建集成模型。该方法在三个基准数据集上进行了测试，并与现有的内置集成模型进行了比较。结果表明，我们的模型在白血病、结肠癌和 11 种肿瘤数据集上的准确率达到了 100%、94.74%和 94.34%。本研究通过确定一组最重要的致癌基因，并与原始数据进行比较，证明了它们的重要性。与现有的策略相比，该方法在准确性和稳定性方面表现出色，对基于 ML 的基因分析的发展具有重要影响。它比其他现有方法更精确和稳定地检测到重要基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5039/10530870/705ee6cd0cc7/genes-14-01802-g001.jpg

相似文献

Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data.利用集成特征选择方法和转录组数据的投票分类器进行癌症分类。

Genes (Basel). 2023 Sep 14;14(9):1802. doi: 10.3390/genes14091802.

An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples.一种基于多重过滤和监督属性聚类算法的集成机器学习模型，用于对癌症样本进行分类。

PeerJ Comput Sci. 2021 Sep 16;7:e671. doi: 10.7717/peerj-cs.671. eCollection 2021.

Ensemble of heterogeneous classifiers for diagnosis and prediction of coronary artery disease with reduced feature subset.用于冠状动脉疾病诊断和预测的具有简化特征子集的异构分类器集成

Comput Methods Programs Biomed. 2021 Jan;198:105770. doi: 10.1016/j.cmpb.2020.105770. Epub 2020 Sep 30.

Novel Feature Selection and Voting Classifier Algorithms for COVID-19 Classification in CT Images.用于CT图像中COVID-19分类的新型特征选择和投票分类器算法

IEEE Access. 2020 Sep 30;8:179317-179335. doi: 10.1109/ACCESS.2020.3028012. eCollection 2020.

A two-stage hybrid biomarker selection method based on ensemble filter and binary differential evolution incorporating binary African vultures optimization.基于集成筛选器和二进制差分进化并结合二进制非洲秃鹫优化的两阶段混合生物标志物选择方法。

BMC Bioinformatics. 2023 Apr 4;24(1):130. doi: 10.1186/s12859-023-05247-7.

R-HEFS: Rough set based heterogeneous ensemble feature selection method for medical data classification.基于粗糙集的异质集成特征选择方法在医学数据分类中的应用。

Artif Intell Med. 2021 Apr;114:102049. doi: 10.1016/j.artmed.2021.102049. Epub 2021 Mar 6.

EKNN: Ensemble classifier incorporating connectivity and density into kNN with application to cancer diagnosis.EKNN：将连通性和密度纳入k近邻算法的集成分类器及其在癌症诊断中的应用

Artif Intell Med. 2021 Jan;111:101985. doi: 10.1016/j.artmed.2020.101985. Epub 2020 Nov 8.

Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods.基于转录组谱特征选择和机器学习方法的乳腺癌预测。

BMC Bioinformatics. 2022 Oct 1;23(1):410. doi: 10.1186/s12859-022-04965-8.

Ensemble Feature Learning of Genomic Data Using Support Vector Machine.使用支持向量机的基因组数据集成特征学习

PLoS One. 2016 Jun 15;11(6):e0157330. doi: 10.1371/journal.pone.0157330. eCollection 2016.

Comparative performance analysis of binary variants of FOX optimization algorithm with half-quadratic ensemble ranking method for thyroid cancer detection.基于半二次集成排序法的 FOX 优化算法二进制变体在甲状腺癌检测中的比较性能分析。

Sci Rep. 2023 Nov 10;13(1):19598. doi: 10.1038/s41598-023-46865-8.

引用本文的文献

Integrative Machine Learning and Bioinformatics Approach for Identifying Key Biomarkers in Gallbladder Cancer Diagnosis and Progression.用于识别胆囊癌诊断和进展关键生物标志物的综合机器学习与生物信息学方法

IET Syst Biol. 2025 Jan-Dec;19(1):e70022. doi: 10.1049/syb2.70022.

Biomarker-driven drug repurposing for NAFLD-associated hepatocellular carcinoma using machine learning integrated ensemble feature selection.使用机器学习集成特征选择技术，基于生物标志物的非酒精性脂肪性肝病相关肝细胞癌药物再利用研究

Front Bioinform. 2025 Apr 17;5:1522401. doi: 10.3389/fbinf.2025.1522401. eCollection 2025.

Weighted-VAE: A deep learning approach for multimodal data generation applied to experimental T. cruzi infection.加权变分自编码器：一种应用于克氏锥虫实验性感染的多模态数据生成的深度学习方法。

PLoS One. 2025 Mar 24;20(3):e0315843. doi: 10.1371/journal.pone.0315843. eCollection 2025.

Utilizing Deep Feature Fusion for Automatic Leukemia Classification: An Internet of Medical Things-Enabled Deep Learning Framework.利用深度特征融合进行自动白血病分类：一种基于物联网的深度学习框架。

Sensors (Basel). 2024 Jul 8;24(13):4420. doi: 10.3390/s24134420.

本文引用的文献

Deep learning-based microarray cancer classification and ensemble gene selection approach.基于深度学习的微阵列癌症分类和集成基因选择方法。

IET Syst Biol. 2022 May;16(3-4):120-131. doi: 10.1049/syb2.12044. Epub 2022 Jul 4.

A framework model using multifilter feature selection to enhance colon cancer classification.基于多滤波器特征选择的结肠癌分类增强框架模型。

PLoS One. 2021 Apr 16;16(4):e0249094. doi: 10.1371/journal.pone.0249094. eCollection 2021.

A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data.基于微阵列基因表达数据对癌症类型进行分类的机器学习和深度学习算法的比较研究。

PeerJ Comput Sci. 2020 Apr 13;6:e270. doi: 10.7717/peerj-cs.270. eCollection 2020.

Classification of Microarray Gene Expression Data Using an Infiltration Tactics Optimization (ITO) Algorithm.基于渗透策略优化（ITO）算法的基因表达微阵列数据分类。

Genes (Basel). 2020 Jul 18;11(7):819. doi: 10.3390/genes11070819.

Analysis of Decision Tree and K-Nearest Neighbor Algorithm in the Classification of Breast Cancer.决策树和K近邻算法在乳腺癌分类中的分析

Asian Pac J Cancer Prev. 2019 Dec 1;20(12):3777-3781. doi: 10.31557/APJCP.2019.20.12.3777.

Diagnostic Accuracy of Different Machine Learning Algorithms for Breast Cancer Risk Calculation: a Meta-Analysis.不同机器学习算法用于乳腺癌风险计算的诊断准确性：一项荟萃分析

Asian Pac J Cancer Prev. 2018 Jul 27;19(7):1747-1752. doi: 10.22034/APJCP.2018.19.7.1747.

GECC: Gene Expression Based Ensemble Classification of Colon Samples.GECC：基于基因表达的结肠样本集成分类法

IEEE/ACM Trans Comput Biol Bioinform. 2014 Nov-Dec;11(6):1131-45. doi: 10.1109/TCBB.2014.2344655.

Classification tree for risk assessment in patients suffering from congestive heart failure via long-term heart rate variability.充血性心力衰竭患者通过长期心率变异性进行风险评估的分类树。

IEEE J Biomed Health Inform. 2013 May;17(3):727-33. doi: 10.1109/jbhi.2013.2244902.

Microarray-based cancer prediction using soft computing approach.基于微阵列的癌症预测：采用软计算方法

Cancer Inform. 2009 May 26;7:123-39. doi: 10.4137/cin.s2655.

An overview of statistical learning theory.统计学习理论概述。

IEEE Trans Neural Netw. 1999;10(5):988-99. doi: 10.1109/72.788640.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用集成特征选择方法和转录组数据的投票分类器进行癌症分类。

Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献