• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

优化癌症分类:一种用于特征选择和预测洞察的混合 RDO-XGBoost 方法。

Optimizing cancer classification: a hybrid RDO-XGBoost approach for feature selection and predictive insights.

机构信息

VIT Bhopal University's School of Advanced Science and Language, Located at Kothrikalan, Sehore, Bhopal, 466114, India.

Planning Department, State Planning Institute (New Division), Lucknow, Utter Pradesh, 226001, India.

出版信息

Cancer Immunol Immunother. 2024 Oct 9;73(12):261. doi: 10.1007/s00262-024-03843-x.

DOI:10.1007/s00262-024-03843-x
PMID:39382649
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11464649/
Abstract

The identification of relevant biomarkers from high-dimensional cancer data remains a significant challenge due to the complexity and heterogeneity inherent in various cancer types. Conventional feature selection methods often struggle to effectively navigate the vast solution space while maintaining high predictive accuracy. In response to these challenges, we introduce a novel feature selection approach that integrates Random Drift Optimization (RDO) with XGBoost, specifically designed to enhance the performance of cancer classification tasks. Our proposed framework not only improves classification accuracy but also offers valuable insights into the underlying biological mechanisms driving cancer progression. Through comprehensive experiments conducted on real-world cancer datasets, including Central Nervous System (CNS), Leukemia, Breast, and Ovarian cancers, we demonstrate the efficacy of our method in identifying a smaller subset of unique and relevant genes. This selection results in significantly improved classification efficiency and accuracy. When compared with popular classifiers such as Support Vector Machine, K-Nearest Neighbor, and Naive Bayes, our approach consistently outperforms these models in terms of both accuracy and F-measure metrics. For instance, our framework achieved an accuracy of 97.24% in the CNS dataset, 99.14% in Leukemia, 95.21% in Ovarian, and 87.62% in Breast cancer, showcasing its robustness and effectiveness across different types of cancer data. These results underline the potential of our RDO-XGBoost framework as a promising solution for feature selection in cancer data analysis, offering enhanced predictive performance and valuable biological insights.

摘要

由于各种癌症类型固有的复杂性和异质性,从高维癌症数据中识别相关生物标志物仍然是一个重大挑战。传统的特征选择方法在有效导航广阔的解决方案空间的同时,往往难以保持高预测准确性。针对这些挑战,我们引入了一种新的特征选择方法,将随机漂移优化(RDO)与 XGBoost 相结合,专门设计用于提高癌症分类任务的性能。我们提出的框架不仅提高了分类准确性,还为驱动癌症进展的潜在生物学机制提供了有价值的见解。通过对包括中枢神经系统(CNS)、白血病、乳腺癌和卵巢癌在内的真实癌症数据集进行全面实验,我们证明了我们的方法在识别更小的独特相关基因子集方面的有效性。这种选择导致分类效率和准确性显著提高。与支持向量机、K-最近邻和朴素贝叶斯等流行的分类器相比,我们的方法在准确性和 F 度量方面始终优于这些模型。例如,我们的框架在 CNS 数据集上实现了 97.24%的准确率,在白血病中达到了 99.14%,在卵巢癌中为 95.21%,在乳腺癌中为 87.62%,展示了其在不同类型癌症数据中的稳健性和有效性。这些结果强调了我们的 RDO-XGBoost 框架作为癌症数据分析中特征选择的有前途的解决方案的潜力,提供了增强的预测性能和有价值的生物学见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5720/11464649/7067f8da974c/262_2024_3843_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5720/11464649/1dbbf407d4a4/262_2024_3843_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5720/11464649/987d9902e20b/262_2024_3843_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5720/11464649/944ba8e2aba2/262_2024_3843_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5720/11464649/99b9c80ebaec/262_2024_3843_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5720/11464649/7067f8da974c/262_2024_3843_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5720/11464649/1dbbf407d4a4/262_2024_3843_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5720/11464649/987d9902e20b/262_2024_3843_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5720/11464649/944ba8e2aba2/262_2024_3843_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5720/11464649/99b9c80ebaec/262_2024_3843_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5720/11464649/7067f8da974c/262_2024_3843_Fig4_HTML.jpg

相似文献

1
Optimizing cancer classification: a hybrid RDO-XGBoost approach for feature selection and predictive insights.优化癌症分类:一种用于特征选择和预测洞察的混合 RDO-XGBoost 方法。
Cancer Immunol Immunother. 2024 Oct 9;73(12):261. doi: 10.1007/s00262-024-03843-x.
2
Feature weight estimation for gene selection: a local hyperlinear learning approach.特征权重估计在基因选择中的应用:一种局部超线性学习方法。
BMC Bioinformatics. 2014 Mar 14;15:70. doi: 10.1186/1471-2105-15-70.
3
A discrete wavelet based feature extraction and hybrid classification technique for microarray data analysis.一种基于离散小波的微阵列数据分析特征提取与混合分类技术。
ScientificWorldJournal. 2014;2014:195470. doi: 10.1155/2014/195470. Epub 2014 Aug 6.
4
C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods.C-HMOSHSSA:使用多目标元启发式和机器学习方法进行癌症分类的基因选择。
Comput Methods Programs Biomed. 2019 Sep;178:219-235. doi: 10.1016/j.cmpb.2019.06.029. Epub 2019 Jun 29.
5
A Tri-Stage Wrapper-Filter Feature Selection Framework for Disease Classification.三阶段包装器-过滤器特征选择框架用于疾病分类。
Sensors (Basel). 2021 Aug 18;21(16):5571. doi: 10.3390/s21165571.
6
Identification of potential biomarkers on microarray data using distributed gene selection approach.基于分布式基因选择方法的芯片数据中潜在生物标志物的识别。
Math Biosci. 2019 Sep;315:108230. doi: 10.1016/j.mbs.2019.108230. Epub 2019 Jul 18.
7
Optimizing cancer diagnosis: A hybrid approach of genetic operators and Sinh Cosh Optimizer for tumor identification and feature gene selection.优化癌症诊断:遗传算子和 Sinh Cosh 优化器的混合方法用于肿瘤识别和特征基因选择。
Comput Biol Med. 2024 Sep;180:108984. doi: 10.1016/j.compbiomed.2024.108984. Epub 2024 Aug 10.
8
Comparison of Classification Success Rates of Different Machine Learning Algorithms in the Diagnosis of Breast Cancer.不同机器学习算法在乳腺癌诊断中的分类成功率比较。
Asian Pac J Cancer Prev. 2022 Oct 1;23(10):3287-3297. doi: 10.31557/APJCP.2022.23.10.3287.
9
Development of an efficient novel method for coronary artery disease prediction using machine learning and deep learning techniques.利用机器学习和深度学习技术开发一种用于冠心病预测的高效新方法。
Technol Health Care. 2024;32(6):4545-4569. doi: 10.3233/THC-240740.
10
Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods.基于转录组谱特征选择和机器学习方法的乳腺癌预测。
BMC Bioinformatics. 2022 Oct 1;23(1):410. doi: 10.1186/s12859-022-04965-8.

引用本文的文献

1
[Not Available].[无可用内容]。
Glob Reg Health Technol Assess. 2025 Sep 5;12:198-204. doi: 10.33393/grhta.2025.3568. eCollection 2025 Jan-Dec.
2
Predicting cancer risk using machine learning on lifestyle and genetic data.利用机器学习对生活方式和基因数据进行癌症风险预测。
Sci Rep. 2025 Aug 19;15(1):30458. doi: 10.1038/s41598-025-15656-8.
3
Identification of biomarkers associated with M1 macrophages in the ST-segment elevation myocardial infarction through bioinformatics and machine learning approaches.通过生物信息学和机器学习方法鉴定ST段抬高型心肌梗死中与M1巨噬细胞相关的生物标志物。

本文引用的文献

1
An Update on the Genetics of IgA Nephropathy.IgA肾病遗传学研究进展
J Clin Med. 2023 Dec 25;13(1):123. doi: 10.3390/jcm13010123.
2
Optimizing Gene Selection and Cancer Classification with Hybrid Sine Cosine and Cuckoo Search Algorithm.基于混合正弦余弦和布谷鸟搜索算法优化基因选择和癌症分类。
J Med Syst. 2024 Jan 9;48(1):10. doi: 10.1007/s10916-023-02031-1.
3
Improved Support Vector Machine based on CNN-SVD for vision-threatening diabetic retinopathy detection and classification.基于 CNN-SVD 的改进支持向量机在威胁视力的糖尿病视网膜病变检测和分类中的应用。
Sci Rep. 2025 Apr 1;15(1):11069. doi: 10.1038/s41598-025-89125-7.
4
SGA-Driven feature selection and random forest classification for enhanced breast cancer diagnosis: A comparative study.基于小基因组区域(SGA)驱动的特征选择和随机森林分类用于增强乳腺癌诊断:一项对比研究。
Sci Rep. 2025 Mar 30;15(1):10944. doi: 10.1038/s41598-025-95786-1.
5
Feature Selection in Breast Cancer Gene Expression Data Using KAO and AOA with SVM Classification.基于支持向量机分类的KAO和AOA算法在乳腺癌基因表达数据中的特征选择
J Med Syst. 2025 Mar 26;49(1):40. doi: 10.1007/s10916-025-02171-6.
6
Enhanced leukemia prediction using hybrid ant colony and ant lion optimization for gene selection and classification.使用混合蚁群和蚁狮优化进行基因选择与分类以增强白血病预测
MethodsX. 2025 Feb 20;14:103239. doi: 10.1016/j.mex.2025.103239. eCollection 2025 Jun.
7
Advanced machine learning framework for enhancing breast cancer diagnostics through transcriptomic profiling.通过转录组分析增强乳腺癌诊断的先进机器学习框架。
Discov Oncol. 2025 Mar 17;16(1):334. doi: 10.1007/s12672-025-02111-3.
8
Risk prediction of hyperuricemia based on particle swarm fusion machine learning solely dependent on routine blood tests.基于仅依赖常规血液检测的粒子群融合机器学习的高尿酸血症风险预测
BMC Med Inform Decis Mak. 2025 Mar 14;25(1):131. doi: 10.1186/s12911-025-02956-2.
9
Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence.通过整合XGBoost、优化主成分分析和可解释人工智能来改善中风风险预测。
BMC Med Inform Decis Mak. 2025 Feb 7;25(1):63. doi: 10.1186/s12911-025-02894-z.
10
Secretary bird optimization algorithm based on quantum computing and multiple strategies improvement for KELM diabetes classification.基于量子计算和多种策略改进的秘书鸟优化算法用于KELM糖尿病分类
Sci Rep. 2025 Jan 30;15(1):3774. doi: 10.1038/s41598-025-87285-0.
PLoS One. 2024 Jan 2;19(1):e0295951. doi: 10.1371/journal.pone.0295951. eCollection 2024.
4
Application of Nonlinear Models Combined with Conventional Laboratory Indicators for the Diagnosis and Differential Diagnosis of Ovarian Cancer.非线性模型联合传统实验室指标在卵巢癌诊断及鉴别诊断中的应用
J Clin Med. 2023 Jan 20;12(3):844. doi: 10.3390/jcm12030844.
5
Fuzzy-based hunger games search algorithm for global optimization and feature selection using medical data.基于模糊的饥饿游戏搜索算法用于使用医学数据的全局优化和特征选择
Neural Comput Appl. 2023;35(7):5251-5275. doi: 10.1007/s00521-022-07916-9. Epub 2022 Nov 1.
6
A new human-based metahurestic optimization method based on mimicking cooking training.一种基于模仿烹饪训练的新型基于人类的启发式优化方法。
Sci Rep. 2022 Sep 1;12(1):14861. doi: 10.1038/s41598-022-19313-2.
7
AltWOA: Altruistic Whale Optimization Algorithm for feature selection on microarray datasets.AltWOA:基于微阵列数据集的特征选择的利他鲸鱼优化算法。
Comput Biol Med. 2022 May;144:105349. doi: 10.1016/j.compbiomed.2022.105349. Epub 2022 Mar 10.
8
Management of validation of HPLC method for determination of acetylsalicylic acid impurities in a new pharmaceutical product.高效液相色谱法测定新型药物中乙酰水杨酸杂质的验证管理。
Sci Rep. 2022 Jan 6;12(1):1. doi: 10.1038/s41598-021-99269-x.
9
Biochemical systems identification by a random drift particle swarm optimization approach.基于随机漂移粒子群优化算法的生化系统辨识。
BMC Bioinformatics. 2014;15 Suppl 6(Suppl 6):S1. doi: 10.1186/1471-2105-15-S6-S1. Epub 2014 May 16.