Department of Pathology, School of Medicine, Zhejiang University, Hangzhou 310058, People's Republic of China.
Biochem Biophys Res Commun. 2012 Mar 9;419(2):148-53. doi: 10.1016/j.bbrc.2012.01.087. Epub 2012 Jan 28.
Although metastasis is the principal cause of death cause for colorectal cancer (CRC) patients, the molecular mechanisms underlying CRC metastasis are still not fully understood. In an attempt to identify metastasis-related genes in CRC, we obtained gene expression profiles of 55 early stage primary CRCs, 56 late stage primary CRCs, and 34 metastatic CRCs from the expression project in Oncology (http://www.intgen.org/expo/). We developed a novel gene selection algorithm (SVM-T-RFE), which extends support vector machine recursive feature elimination (SVM-RFE) algorithm by incorporating T-statistic. We achieved highest classification accuracy (100%) with smaller gene subsets (10 and 6, respectively), when classifying between early and late stage primary CRCs, as well as between metastatic CRCs and late stage primary CRCs. We also compared the performance of SVM-T-RFE and SVM-RFE gene selection algorithms on another large-scale CRC dataset and the five public microarray datasets. SVM-T-RFE bestowed SVM-RFE algorithm in identifying more differentially expressed genes, and achieving highest prediction accuracy using equal or smaller number of selected genes. A fraction of selected genes have been reported to be associated with CRC development or metastasis.
虽然转移是导致结直肠癌(CRC)患者死亡的主要原因,但 CRC 转移的分子机制仍未完全阐明。为了鉴定 CRC 转移相关基因,我们从 Oncologyl 中的表达项目(http://www.intgen.org/expo/)获得了 55 例早期原发性 CRC、56 例晚期原发性 CRC 和 34 例转移性 CRC 的基因表达谱。我们开发了一种新的基因选择算法(SVM-T-RFE),该算法通过纳入 T 统计量扩展了支持向量机递归特征消除(SVM-RFE)算法。当我们将早期和晚期原发性 CRC 以及转移性 CRC 和晚期原发性 CRC 进行分类时,该算法使用更小的基因子集(分别为 10 和 6)实现了最高的分类准确性(100%)。我们还在另一个大规模 CRC 数据集和五个公共微阵列数据集上比较了 SVM-T-RFE 和 SVM-RFE 基因选择算法的性能。SVM-T-RFE 使 SVM-RFE 算法在识别更多差异表达基因方面具有更好的性能,并使用相等或更少数量的选择基因实现了最高的预测准确性。选择的基因中有一部分已被报道与 CRC 的发生或转移有关。