Department of Gastrointestinal Surgery, Affiliated Hospital of Jiangnan University, Wuxi, Jiangsu 214062, P.R. China.
Mol Med Rep. 2020 Jan;21(1):347-359. doi: 10.3892/mmr.2019.10841. Epub 2019 Nov 21.
Gastric cancer (GC) ranks fifth in terms of incidence and third in terms of tumor mortality worldwide. The present study was designed to construct a Support Vector Machine (SVM) classifier and risk score system for GC. The GSE62254 (training set) and GSE26253 (validation set 2) datasets were downloaded from the Gene Expression Omnibus database. Furthermore, the gene expression profile of GC (validation set 1) was obtained from The Cancer Genome Atlas database. Differentially expressed genes (DEGs) between recurrent and non‑recurrent samples were determined using the limma package. The feature genes were selected using the Caret package, and an SVM classifier was built using the e1071 package. Using the penalized package, the optimal predictive genes for constructing a risk score system were screened. Finally, stratification analysis of clinical factors and pathway enrichment analysis were performed using Gene Set Enrichment Analysis. A total of 239 DEGs were identified in GSE62254, among which 114 DEGs were significantly associated with both recurrence‑free survival and overall survival. Subsequently, 21 feature genes were screened from the 114 DEGs, and an SVM classifier was built. A risk score system for survival prediction was constructed, following the selection of 10 optimal genes, including A‑kinase anchoring protein 12, angiopoietin‑like protein 1, cysteine‑rich sequence 1, myeloid/lymphoid or mixed‑lineage leukemia, translocated to chromosome 11, neuron navigator 3, neurobeachin, nephroblastoma overexpressed, pleiotrophin, tumor suppressor candidate 3 and zinc finger and SCAN domain containing 18. The stratification analysis revealed that pathological stage was an independent prognostic clinical factor in the high‑risk group. Additionally, eight significant pathways were associated with the 10‑gene signature. The SVM classifier and risk score system may be applied for classifying and predicting the prognosis of patients with GC, respectively.
胃癌(GC)在全球范围内的发病率排名第五,肿瘤死亡率排名第三。本研究旨在构建支持向量机(SVM)分类器和 GC 风险评分系统。从基因表达综合数据库中下载 GSE62254(训练集)和 GSE26253(验证集 2)数据集。此外,从癌症基因组图谱数据库中获得 GC 的基因表达谱(验证集 1)。使用 limma 包确定复发和非复发样本之间的差异表达基因(DEGs)。使用 Caret 包选择特征基因,并使用 e1071 包构建 SVM 分类器。使用 penalized 包筛选构建风险评分系统的最优预测基因。最后,使用基因集富集分析进行临床因素分层分析和通路富集分析。在 GSE62254 中鉴定出 239 个 DEGs,其中 114 个 DEG 与无复发生存率和总生存率均显著相关。随后,从 114 个 DEG 中筛选出 21 个特征基因,并构建 SVM 分类器。选择 10 个最优基因构建生存预测风险评分系统,包括 A 激酶锚定蛋白 12、血管生成素样蛋白 1、富含半胱氨酸序列 1、髓系/淋巴或混合谱系白血病、易位到染色体 11、神经导航 3、神经贝钦、神经母细胞瘤过表达、多效素、候选肿瘤抑制因子 3 和锌指和 SCAN 域包含 18。分层分析表明,病理分期是高危组中独立的预后临床因素。此外,有 8 个显著通路与 10 基因特征相关。SVM 分类器和风险评分系统可分别用于 GC 患者的分类和预后预测。