Department of Pathology, The University of Alabama at Birmingham, Birmingham, AL, USA.
Int J Nanomedicine. 2013;8 Suppl 1(Suppl 1):57-62. doi: 10.2147/IJN.S40733. Epub 2013 Sep 16.
Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optimal number of genes as an input of a feature selection algorithm for survival modeling.
The expression profiles and outcomes of 544 patients were retrieved from The Cancer Genome Atlas. We compared four different survival prediction methods: (1) 1-nearest neighbor (1-NN) survival prediction method; (2) random patient selection method and a Cox-based regression method with nested cross-validation; (3) least absolute shrinkage and selection operator (LASSO) optimization using whole-genome gene expression profiles; or (4) gene expression profiles of cancer pathway genes.
The 1-NN method performed better than the random patient selection method in terms of survival predictions, although it does not include a feature selection step. The Cox-based regression method with LASSO optimization using whole-genome gene expression data demonstrated higher survival prediction power than the 1-NN method, but was outperformed by the same method when using gene expression profiles of cancer pathway genes alone.
The 1-NN survival prediction method may require more patients for better performance, even when omitting censored data. Using preexisting biological knowledge for survival prediction is reasonable as a means to understand the biological system of a cancer, unless the analysis goal is to identify completely unknown genes relevant to cancer biology.
个性化医学基于识别常见疾病亚组以进行更好治疗的概念。鉴定预测疾病亚型的生物标志物一直是生物医学科学的主要焦点。在全基因组分析的时代,对于生存建模的特征选择算法的最佳基因数量存在争议。
从癌症基因组图谱中检索了 544 名患者的表达谱和结果。我们比较了四种不同的生存预测方法:(1)1-最近邻(1-NN)生存预测方法;(2)随机患者选择方法和基于 Cox 的回归方法与嵌套交叉验证;(3)使用全基因组基因表达谱的最小绝对收缩和选择算子(LASSO)优化;或(4)癌症途径基因的基因表达谱。
1-NN 方法在生存预测方面优于随机患者选择方法,尽管它不包括特征选择步骤。使用全基因组基因表达数据的 Cox 回归方法与 LASSO 优化的方法在生存预测方面表现出更高的预测能力,但在仅使用癌症途径基因的基因表达谱时表现不如 1-NN 方法。
即使忽略截尾数据,1-NN 生存预测方法可能需要更多患者才能获得更好的性能。使用现有生物学知识进行生存预测是合理的,因为它可以帮助理解癌症的生物学系统,除非分析目标是识别与癌症生物学完全无关的未知基因。