Suppr超能文献

TCGA 中的特征选择和生存建模。

Feature selection and survival modeling in The Cancer Genome Atlas.

机构信息

Department of Pathology, The University of Alabama at Birmingham, Birmingham, AL, USA.

出版信息

Int J Nanomedicine. 2013;8 Suppl 1(Suppl 1):57-62. doi: 10.2147/IJN.S40733. Epub 2013 Sep 16.

Abstract

PURPOSE

Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optimal number of genes as an input of a feature selection algorithm for survival modeling.

PATIENTS AND METHODS

The expression profiles and outcomes of 544 patients were retrieved from The Cancer Genome Atlas. We compared four different survival prediction methods: (1) 1-nearest neighbor (1-NN) survival prediction method; (2) random patient selection method and a Cox-based regression method with nested cross-validation; (3) least absolute shrinkage and selection operator (LASSO) optimization using whole-genome gene expression profiles; or (4) gene expression profiles of cancer pathway genes.

RESULTS

The 1-NN method performed better than the random patient selection method in terms of survival predictions, although it does not include a feature selection step. The Cox-based regression method with LASSO optimization using whole-genome gene expression data demonstrated higher survival prediction power than the 1-NN method, but was outperformed by the same method when using gene expression profiles of cancer pathway genes alone.

CONCLUSION

The 1-NN survival prediction method may require more patients for better performance, even when omitting censored data. Using preexisting biological knowledge for survival prediction is reasonable as a means to understand the biological system of a cancer, unless the analysis goal is to identify completely unknown genes relevant to cancer biology.

摘要

目的

个性化医学基于识别常见疾病亚组以进行更好治疗的概念。鉴定预测疾病亚型的生物标志物一直是生物医学科学的主要焦点。在全基因组分析的时代,对于生存建模的特征选择算法的最佳基因数量存在争议。

患者和方法

从癌症基因组图谱中检索了 544 名患者的表达谱和结果。我们比较了四种不同的生存预测方法:(1)1-最近邻(1-NN)生存预测方法;(2)随机患者选择方法和基于 Cox 的回归方法与嵌套交叉验证;(3)使用全基因组基因表达谱的最小绝对收缩和选择算子(LASSO)优化;或(4)癌症途径基因的基因表达谱。

结果

1-NN 方法在生存预测方面优于随机患者选择方法,尽管它不包括特征选择步骤。使用全基因组基因表达数据的 Cox 回归方法与 LASSO 优化的方法在生存预测方面表现出更高的预测能力,但在仅使用癌症途径基因的基因表达谱时表现不如 1-NN 方法。

结论

即使忽略截尾数据,1-NN 生存预测方法可能需要更多患者才能获得更好的性能。使用现有生物学知识进行生存预测是合理的,因为它可以帮助理解癌症的生物学系统,除非分析目标是识别与癌症生物学完全无关的未知基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63a0/3790279/cae8593c6318/ijn-8-057Fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验