• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种强大的集成特征选择方法,用于在高维基因表达数据中对与生存结果相关的基因进行优先级排序。

A robust ensemble feature selection approach to prioritize genes associated with survival outcome in high-dimensional gene expression data.

作者信息

Le Phi, Gong Xingyue, Ung Leah, Yang Hai, Keenan Bridget P, Zhang Li, He Tao

机构信息

Division of Hematology/Oncology, Department of Medicine, University of California, San Francisco, San Francisco, CA, United States.

Department of Physiological Nursing, School of Nursing, University of California, San Francisco, San Francisco, CA, United States.

出版信息

Front Syst Biol. 2024;4. doi: 10.3389/fsysb.2024.1355595. Epub 2024 Mar 20.

DOI:10.3389/fsysb.2024.1355595
PMID:39897528
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11786965/
Abstract

Exploring features associated with the clinical outcome of interest is a rapidly advancing area of research. However, with contemporary sequencing technologies capable of identifying over thousands of genes per sample, there is a challenge in constructing efficient prediction models that balance accuracy and resource utilization. To address this challenge, researchers have developed feature selection methods to enhance performance, reduce overfitting, and ensure resource efficiency. However, applying feature selection models to survival analysis, particularly in clinical datasets characterized by substantial censoring and limited sample sizes, introduces unique challenges. We propose a robust ensemble feature selection approach integrated with group Lasso to identify compelling features and evaluate its performance in predicting survival outcomes. Our approach consistently outperforms established models across various criteria through extensive simulations, demonstrating low false discovery rates, high sensitivity, and high stability. Furthermore, we applied the approach to a colorectal cancer dataset from The Cancer Genome Atlas, showcasing its effectiveness by generating a composite score based on the selected genes to correctly distinguish different subtypes of the patients. In summary, our proposed approach excels in selecting impactful features from high-dimensional data, yielding better outcomes compared to contemporary state-of-the-art models.

摘要

探索与感兴趣的临床结果相关的特征是一个快速发展的研究领域。然而,当代测序技术能够在每个样本中识别数千个基因,在构建平衡准确性和资源利用的高效预测模型方面存在挑战。为应对这一挑战,研究人员开发了特征选择方法以提高性能、减少过拟合并确保资源效率。然而,将特征选择模型应用于生存分析,尤其是在存在大量删失和样本量有限的临床数据集中,会带来独特的挑战。我们提出了一种与组套索集成的稳健集成特征选择方法,以识别有说服力的特征并评估其在预测生存结果方面的性能。通过广泛的模拟,我们的方法在各种标准下始终优于现有模型,显示出低错误发现率、高灵敏度和高稳定性。此外,我们将该方法应用于来自癌症基因组图谱的结直肠癌数据集,通过基于所选基因生成综合评分来正确区分患者的不同亚型,展示了其有效性。总之,我们提出的方法在从高维数据中选择有影响力的特征方面表现出色,与当代最先进的模型相比产生了更好的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/e7146709bfc7/fsysb-04-1355595-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/d3b930f0c6b3/fsysb-04-1355595-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/11d566c863ec/fsysb-04-1355595-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/f51e3686db0e/fsysb-04-1355595-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/3c66f5937686/fsysb-04-1355595-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/904f3e12859c/fsysb-04-1355595-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/e7146709bfc7/fsysb-04-1355595-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/d3b930f0c6b3/fsysb-04-1355595-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/11d566c863ec/fsysb-04-1355595-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/f51e3686db0e/fsysb-04-1355595-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/3c66f5937686/fsysb-04-1355595-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/904f3e12859c/fsysb-04-1355595-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/e7146709bfc7/fsysb-04-1355595-g006.jpg

相似文献

1
A robust ensemble feature selection approach to prioritize genes associated with survival outcome in high-dimensional gene expression data.一种强大的集成特征选择方法,用于在高维基因表达数据中对与生存结果相关的基因进行优先级排序。
Front Syst Biol. 2024;4. doi: 10.3389/fsysb.2024.1355595. Epub 2024 Mar 20.
2
Combined Performance of Screening and Variable Selection Methods in Ultra-High Dimensional Data in Predicting Time-To-Event Outcomes.超高维数据中筛选和变量选择方法在预测事件发生时间结局方面的综合性能
Diagn Progn Res. 2018;2. doi: 10.1186/s41512-018-0043-4. Epub 2018 Sep 26.
3
Novel Ensemble Feature Selection Approach and Application in Repertoire Sequencing Data.新型集成特征选择方法及其在免疫组库测序数据中的应用
Front Genet. 2022 Apr 26;13:821832. doi: 10.3389/fgene.2022.821832. eCollection 2022.
4
Evaluating key predictors of breast cancer through survival: a comparison of AFT frailty models with LASSO, ridge, and elastic net regularization.通过生存分析评估乳腺癌的关键预测因素:AFT脆弱模型与LASSO、岭回归和弹性网络正则化的比较
BMC Cancer. 2025 Apr 11;25(1):665. doi: 10.1186/s12885-025-14040-z.
5
Improved nonparametric survival prediction using CoxPH, Random Survival Forest & DeepHit Neural Network.基于 CoxPH、随机生存森林和 DeepHit 神经网络的改进非参数生存预测。
BMC Med Inform Decis Mak. 2024 May 7;24(1):120. doi: 10.1186/s12911-024-02525-z.
6
Enhancing Cancerous Gene Selection and Classification for High-Dimensional Microarray Data Using a Novel Hybrid Filter and Differential Evolutionary Feature Selection.使用新型混合滤波器和差分进化特征选择增强高维微阵列数据的癌基因选择和分类
Cancers (Basel). 2024 Nov 22;16(23):3913. doi: 10.3390/cancers16233913.
7
Efficient Explainable Models for Alzheimer's Disease Classification with Feature Selection and Data Balancing Approach Using Ensemble Learning.基于集成学习的特征选择和数据平衡方法的阿尔茨海默病分类高效可解释模型
Diagnostics (Basel). 2024 Dec 10;14(24):2770. doi: 10.3390/diagnostics14242770.
8
LMFE: A Novel Method for Predicting Plant LncRNA Based on Multi-Feature Fusion and Ensemble Learning.LMFE:一种基于多特征融合与集成学习预测植物长链非编码RNA的新方法。
Genes (Basel). 2025 Mar 31;16(4):424. doi: 10.3390/genes16040424.
9
Nested ensemble selection: An effective hybrid feature selection method.嵌套集成选择:一种有效的混合特征选择方法。
Heliyon. 2023 Sep 9;9(9):e19686. doi: 10.1016/j.heliyon.2023.e19686. eCollection 2023 Sep.
10
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.

引用本文的文献

1
On Selecting Robust Approaches for Learning Predictive Biomarkers in Metabolomics Data Sets.关于选择稳健方法以在代谢组学数据集中学习预测性生物标志物
Anal Chem. 2025 Jun 24;97(24):12669-12678. doi: 10.1021/acs.analchem.5c01049. Epub 2025 Jun 12.

本文引用的文献

1
Development and validation of a prognostic 9-gene signature for colorectal cancer.一种用于结直肠癌的预后9基因特征的开发与验证
Front Oncol. 2022 Nov 17;12:1009698. doi: 10.3389/fonc.2022.1009698. eCollection 2022.
2
Novel Ensemble Feature Selection Approach and Application in Repertoire Sequencing Data.新型集成特征选择方法及其在免疫组库测序数据中的应用
Front Genet. 2022 Apr 26;13:821832. doi: 10.3389/fgene.2022.821832. eCollection 2022.
3
Variable selection with Group LASSO approach: Application to Cox regression with frailty model.
采用组套索方法进行变量选择:在含脆弱模型的Cox回归中的应用。
Commun Stat Simul Comput. 2021;50(3):881-901. doi: 10.1080/03610918.2019.1571605. Epub 2018 Feb 28.
4
Next-generation sequencing technologies: An overview.下一代测序技术:概述
Hum Immunol. 2021 Nov;82(11):801-811. doi: 10.1016/j.humimm.2021.02.012. Epub 2021 Mar 19.
5
Survival analysis-part 2: Cox proportional hazards model.生存分析 - 第2部分:Cox比例风险模型。
Indian J Thorac Cardiovasc Surg. 2021 Mar;37(2):229-233. doi: 10.1007/s12055-020-01108-7. Epub 2021 Jan 2.
6
Detecting Prognosis Risk Biomarkers for Colon Cancer Through Multi-Omics-Based Prognostic Analysis and Target Regulation Simulation Modeling.通过基于多组学的预后分析和靶向调控模拟建模检测结肠癌的预后风险生物标志物
Front Genet. 2020 May 26;11:524. doi: 10.3389/fgene.2020.00524. eCollection 2020.
7
Improving survival prediction using a novel feature selection and feature reduction framework based on the integration of clinical and molecular data.基于临床与分子数据整合的新型特征选择与降维框架提高生存预测。
Pac Symp Biocomput. 2020;25:415-426.
8
The Cancer Genomics Cloud: Collaborative, Reproducible, and Democratized-A New Paradigm in Large-Scale Computational Research.癌症基因组学云:协作、可重复且民主化——大规模计算研究的新范式
Cancer Res. 2017 Nov 1;77(21):e3-e6. doi: 10.1158/0008-5472.CAN-17-0387.
9
Evaluation of variable selection methods for random forests and omics data sets.随机森林和组学数据集变量选择方法的评估。
Brief Bioinform. 2019 Mar 22;20(2):492-503. doi: 10.1093/bib/bbx124.
10
Minimum redundancy maximum relevance feature selection approach for temporal gene expression data.用于时间基因表达数据的最小冗余最大相关特征选择方法
BMC Bioinformatics. 2017 Jan 3;18(1):9. doi: 10.1186/s12859-016-1423-9.