Suppr超能文献

一种强大的集成特征选择方法,用于在高维基因表达数据中对与生存结果相关的基因进行优先级排序。

A robust ensemble feature selection approach to prioritize genes associated with survival outcome in high-dimensional gene expression data.

作者信息

Le Phi, Gong Xingyue, Ung Leah, Yang Hai, Keenan Bridget P, Zhang Li, He Tao

机构信息

Division of Hematology/Oncology, Department of Medicine, University of California, San Francisco, San Francisco, CA, United States.

Department of Physiological Nursing, School of Nursing, University of California, San Francisco, San Francisco, CA, United States.

出版信息

Front Syst Biol. 2024;4. doi: 10.3389/fsysb.2024.1355595. Epub 2024 Mar 20.

Abstract

Exploring features associated with the clinical outcome of interest is a rapidly advancing area of research. However, with contemporary sequencing technologies capable of identifying over thousands of genes per sample, there is a challenge in constructing efficient prediction models that balance accuracy and resource utilization. To address this challenge, researchers have developed feature selection methods to enhance performance, reduce overfitting, and ensure resource efficiency. However, applying feature selection models to survival analysis, particularly in clinical datasets characterized by substantial censoring and limited sample sizes, introduces unique challenges. We propose a robust ensemble feature selection approach integrated with group Lasso to identify compelling features and evaluate its performance in predicting survival outcomes. Our approach consistently outperforms established models across various criteria through extensive simulations, demonstrating low false discovery rates, high sensitivity, and high stability. Furthermore, we applied the approach to a colorectal cancer dataset from The Cancer Genome Atlas, showcasing its effectiveness by generating a composite score based on the selected genes to correctly distinguish different subtypes of the patients. In summary, our proposed approach excels in selecting impactful features from high-dimensional data, yielding better outcomes compared to contemporary state-of-the-art models.

摘要

探索与感兴趣的临床结果相关的特征是一个快速发展的研究领域。然而,当代测序技术能够在每个样本中识别数千个基因,在构建平衡准确性和资源利用的高效预测模型方面存在挑战。为应对这一挑战,研究人员开发了特征选择方法以提高性能、减少过拟合并确保资源效率。然而,将特征选择模型应用于生存分析,尤其是在存在大量删失和样本量有限的临床数据集中,会带来独特的挑战。我们提出了一种与组套索集成的稳健集成特征选择方法,以识别有说服力的特征并评估其在预测生存结果方面的性能。通过广泛的模拟,我们的方法在各种标准下始终优于现有模型,显示出低错误发现率、高灵敏度和高稳定性。此外,我们将该方法应用于来自癌症基因组图谱的结直肠癌数据集,通过基于所选基因生成综合评分来正确区分患者的不同亚型,展示了其有效性。总之,我们提出的方法在从高维数据中选择有影响力的特征方面表现出色,与当代最先进的模型相比产生了更好的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f02/12342029/d3b930f0c6b3/fsysb-04-1355595-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验