• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于递归特征消除的特征选择框架,提高结直肠癌多死因分类性能

Union With Recursive Feature Elimination: A Feature Selection Framework to Improve the Classification Performance of Multicategory Causes of Death in Colorectal Cancer.

机构信息

School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.

School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.

出版信息

Lab Invest. 2024 Mar;104(3):100320. doi: 10.1016/j.labinv.2023.100320. Epub 2023 Dec 28.

DOI:10.1016/j.labinv.2023.100320
PMID:38158124
Abstract

Despite the use of machine learning tools, it is challenging to properly model cause-specific deaths in colorectal cancer (CRC) patients and choose appropriate treatments. Here, we propose an interesting feature selection framework, namely union with recursive feature elimination (U-RFE), to select the union feature sets that are crucial in CRC progression-specific mortality using The Cancer Genome Atlas (TCGA) dataset. Based on the union feature sets, we compared the performance of 5 classification algorithms, including logistic regression (LR), support vector machines (SVM), random forest (RF), eXtreme gradient boosting (XGBoost), and Stacking, to identify the best model for classifying 4-category deaths. In the first stage of U-RFE, LR, SVM, and RF were used as base estimators to obtain subsets containing the same number of features but not exactly the same specific features. Union analysis of the subsets was then performed to determine the final union feature set, effectively combining the advantages of different algorithms. We found that the U-RFE framework could improve various models' performance. Stacking outperformed LR, SVM, RF, and XGBoost in most scenarios. When the target feature number of the RFE was set to 50 and the union feature set contained 298 deterministic features, the Stacking model achieved F1_weighted, Recall_weighted, Precision_weighted, Accuracy, and Matthews correlation coefficient of 0.851, 0.864, 0.854, 0.864, and 0.717, respectively. The performance of the minority categories was also significantly improved. Therefore, this recursive feature elimination-based approach of feature selection improves performances of classifying CRC deaths using clinical and omics data or those using other data with high feature redundancy and imbalance.

摘要

尽管使用了机器学习工具,但要正确地对结直肠癌(CRC)患者的特定病因死亡进行建模并选择合适的治疗方法仍然具有挑战性。在这里,我们提出了一个有趣的特征选择框架,即联合递归特征消除(U-RFE),该框架使用癌症基因组图谱(TCGA)数据集选择与 CRC 进展特异性死亡率相关的关键联合特征集。基于联合特征集,我们比较了 5 种分类算法的性能,包括逻辑回归(LR)、支持向量机(SVM)、随机森林(RF)、极端梯度提升(XGBoost)和堆叠,以识别最佳模型来对 4 类死亡进行分类。在 U-RFE 的第一阶段,LR、SVM 和 RF 被用作基础估计器,以获取包含相同数量特征但不完全相同特定特征的子集。然后对子集进行联合分析,以确定最终的联合特征集,从而有效地结合了不同算法的优势。我们发现 U-RFE 框架可以提高各种模型的性能。在大多数情况下,堆叠的性能优于 LR、SVM、RF 和 XGBoost。当 RFE 的目标特征数设置为 50 且联合特征集包含 298 个确定性特征时,堆叠模型的 F1_weighted、Recall_weighted、Precision_weighted、Accuracy 和 Matthews 相关系数分别为 0.851、0.864、0.854、0.864 和 0.717。少数类别的性能也得到了显著提高。因此,这种基于递归特征消除的特征选择方法可以提高使用临床和组学数据或使用其他具有高特征冗余和不平衡的数据对 CRC 死亡进行分类的性能。

相似文献

1
Union With Recursive Feature Elimination: A Feature Selection Framework to Improve the Classification Performance of Multicategory Causes of Death in Colorectal Cancer.基于递归特征消除的特征选择框架,提高结直肠癌多死因分类性能
Lab Invest. 2024 Mar;104(3):100320. doi: 10.1016/j.labinv.2023.100320. Epub 2023 Dec 28.
2
Prediction and feature selection of low birth weight using machine learning algorithms.利用机器学习算法预测和选择低出生体重。
J Health Popul Nutr. 2024 Oct 12;43(1):157. doi: 10.1186/s41043-024-00647-8.
3
Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods.基于稳健机器学习-递归特征消除方法的基因表达数据的稳健生物标志物筛选。
Comput Biol Chem. 2022 Oct;100:107747. doi: 10.1016/j.compbiolchem.2022.107747. Epub 2022 Jul 29.
4
An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data.基于基因表达数据的多支持向量机技术的高效特征选择策略。
Biomed Res Int. 2018 Aug 30;2018:7538204. doi: 10.1155/2018/7538204. eCollection 2018.
5
Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models.多指标特征选择分析结直肠癌的多类别结局:随机森林和多项逻辑回归模型。
Lab Invest. 2022 Mar;102(3):236-244. doi: 10.1038/s41374-021-00662-x. Epub 2021 Sep 18.
6
Classification of pulmonary lesion based on multiparametric MRI: utility of radiomics and comparison of machine learning methods.基于多参数 MRI 的肺部病变分类:放射组学的效用及机器学习方法的比较。
Eur Radiol. 2020 Aug;30(8):4595-4605. doi: 10.1007/s00330-020-06768-y. Epub 2020 Mar 28.
7
Predictive model and risk analysis for peripheral vascular disease in type 2 diabetes mellitus patients using machine learning and shapley additive explanation.基于机器学习和 Shapley 加法解释的 2 型糖尿病患者外周血管疾病预测模型和风险分析。
Front Endocrinol (Lausanne). 2024 Feb 28;15:1320335. doi: 10.3389/fendo.2024.1320335. eCollection 2024.
8
Classification and prediction of spinal disease based on the SMOTE-RFE-XGBoost model.基于SMOTE-RFE-XGBoost模型的脊柱疾病分类与预测
PeerJ Comput Sci. 2023 Mar 10;9:e1280. doi: 10.7717/peerj-cs.1280. eCollection 2023.
9
SVM-T-RFE: a novel gene selection algorithm for identifying metastasis-related genes in colorectal cancer using gene expression profiles.SVM-T-RFE:一种基于基因表达谱识别结直肠癌转移相关基因的新型基因选择算法。
Biochem Biophys Res Commun. 2012 Mar 9;419(2):148-53. doi: 10.1016/j.bbrc.2012.01.087. Epub 2012 Jan 28.
10
Multimodality radiomics prediction of radiotherapy-induced the early proctitis and cystitis in rectal cancer patients: a machine learning study.多模态放射组学预测直肠癌患者放疗诱导的早期直肠炎和膀胱炎:一项机器学习研究。
Biomed Phys Eng Express. 2023 Dec 20;10(1). doi: 10.1088/2057-1976/ad0f3e.

引用本文的文献

1
An interpretable machine learning approach for predicting drug-resistant epilepsy in children with tuberous sclerosis complex.一种用于预测结节性硬化症患儿耐药性癫痫的可解释机器学习方法。
Front Neurol. 2025 Aug 4;16:1623212. doi: 10.3389/fneur.2025.1623212. eCollection 2025.
2
Towards machine learning fairness in classifying multicategory causes of deaths in colorectal or lung cancer patients.迈向结直肠癌或肺癌患者多类别死亡原因分类中的机器学习公平性。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf398.
3
Machine learning algorithm based on combined clinical indicators for the prediction of infertility and pregnancy loss.
基于联合临床指标的机器学习算法用于预测不孕症和流产。
Front Endocrinol (Lausanne). 2025 Jul 18;16:1544724. doi: 10.3389/fendo.2025.1544724. eCollection 2025.
4
Normalization and Selecting Non-Differentially Expressed Genes Improve Machine Learning Modelling of Cross-Platform Transcriptomic Data.归一化和选择非差异表达基因可改善跨平台转录组数据的机器学习建模
Trans Artif Intell. 2025;1(1). doi: 10.53941/tai.2025.100005. Epub 2025 May 25.
5
Early detection of mental health disorders using machine learning models using behavioral and voice data analysis.利用行为和语音数据分析的机器学习模型进行心理健康障碍的早期检测。
Sci Rep. 2025 May 13;15(1):16518. doi: 10.1038/s41598-025-00386-8.
6
Intelligent predictive risk assessment and management of sarcopenia in chronic disease patients using machine learning and a web-based tool.使用机器学习和基于网络的工具对慢性病患者的肌肉减少症进行智能预测风险评估和管理。
Eur J Med Res. 2025 Apr 29;30(1):345. doi: 10.1186/s40001-025-02606-3.
7
Towards machine learning fairness in classifying multicategory causes of deaths in colorectal or lung cancer patients.迈向结直肠癌或肺癌患者多类别死因分类中的机器学习公平性
bioRxiv. 2025 Feb 19:2025.02.14.638368. doi: 10.1101/2025.02.14.638368.
8
Normalization and selecting non-differentially expressed genes improve machine learning modelling of cross-platform transcriptomic data.标准化和选择非差异表达基因可改善跨平台转录组数据的机器学习建模。
ArXiv. 2025 Jan 24:arXiv:2501.14248v1.
9
Construction of a prognostic prediction model for colorectal cancer based on 5-year clinical follow-up data.基于5年临床随访数据构建结直肠癌预后预测模型
Sci Rep. 2025 Jan 21;15(1):2701. doi: 10.1038/s41598-025-86872-5.
10
Improving model performance in mapping black-soil resource with machine learning methods and multispectral features.利用机器学习方法和多光谱特征提高黑土资源制图的模型性能。
Sci Rep. 2025 Jan 7;15(1):1199. doi: 10.1038/s41598-024-82399-3.