Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL 32611, USA.
Breast Cancer Res Treat. 2010 Feb;119(3):593-9. doi: 10.1007/s10549-009-0365-6. Epub 2009 Mar 17.
Previous studies have demonstrated the potential value of gene expression signatures in assessing the risk of post-surgical breast cancer recurrence, however, many of these predictive models have been derived using simple computational algorithms and validated internally or using one-way validation on a single dataset. We have recently developed a new feature selection algorithm that overcomes some limitations inherent to high-dimensional data analysis. In this study, we applied this algorithm to two publicly available gene expression datasets obtained from over 400 patients with breast cancer to investigate whether we could derive more accurate prognostic signatures and reveal common predictive factors across independent datasets. We compared the performance of three advanced computational algorithms using a robust two-way validation method, where one dataset was used for training and to establish a prediction model that was then blindly tested on the other dataset. The experiment was then repeated in the reverse direction. Analyses identified prognostic signatures that while comprised of only 10-13 genes, significantly outperformed previously reported signatures for breast cancer evaluation. The cross-validation approach revealed CEGP1 and PRAME as major candidates for breast cancer biomarker development.
先前的研究已经证明基因表达特征在评估乳腺癌术后复发风险方面具有潜在价值,然而,许多这些预测模型都是使用简单的计算算法,并通过内部或单一数据集的单向验证进行验证。我们最近开发了一种新的特征选择算法,克服了高维数据分析中固有的一些限制。在这项研究中,我们将该算法应用于两个公开的基因表达数据集,这些数据集来自 400 多名乳腺癌患者,以研究我们是否可以得出更准确的预后特征,并揭示独立数据集之间的共同预测因素。我们使用稳健的双向验证方法比较了三种先进的计算算法的性能,其中一个数据集用于训练并建立预测模型,然后在另一个数据集上进行盲目测试。然后,实验以相反的方向重复进行。分析确定了预后特征,虽然仅由 10-13 个基因组成,但在评估乳腺癌方面明显优于以前报道的特征。交叉验证方法揭示了 CEGP1 和 PRAME 作为乳腺癌生物标志物开发的主要候选者。