Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, 210009, China.
Shanghai Yunying Medical Technology Co., Ltd., Shanghai, 201612, China.
Curr Med Sci. 2021 Apr;41(2):368-374. doi: 10.1007/s11596-021-2356-8. Epub 2021 Apr 20.
Colorectal cancer (CRC) is the third most commonly diagnosed cancer worldwide. Several studies have indicated that rectal cancer is significantly different from colon cancer in terms of treatment, prognosis, and metastasis. Recently, the differential mRNA expression of colon cancer and rectal cancer has received a great deal of attention. The current study aimed to identify significant differences between colon cancer and rectal cancer based on RNA sequencing (RNA-seq) data via support vector machines (SVM). Here, 393 CRC samples from the The Cancer Genome Atlas (TCGA) database were investigated, including 298 patients with colon cancer and 95 with rectal cancer. Following the random forest (RF) analysis of the mRNA expression data, 96 genes such as HOXB13, PRAC, and BCLAF1 were identified and utilized to build the SVM classification model with the Leave-One-Out Cross-validation (LOOCV) algorithm. In the training (n=196) and the validation cohorts (n=197), the accuracy (82.1 % and 82.2 %, respectively) and the AUC (0.87 and 0.91, respectively) indicated that the established optimal SVM classification model distinguished colon cancer from rectal cancer reasonably. However, additional experiments are required to validate the predicted gene expression levels and functions.
结直肠癌(CRC)是全球第三大常见癌症。多项研究表明,直肠癌在治疗、预后和转移方面与结肠癌有显著差异。最近,结肠癌和直肠癌的差异 mRNA 表达受到了广泛关注。本研究旨在通过支持向量机(SVM)基于 RNA 测序(RNA-seq)数据识别结肠癌和直肠癌之间的显著差异。本研究共纳入了来自癌症基因组图谱(TCGA)数据库的 393 例 CRC 样本,包括 298 例结肠癌患者和 95 例直肠癌患者。对 mRNA 表达数据进行随机森林(RF)分析后,鉴定出 96 个基因,如 HOXB13、PRAC 和 BCLAF1,并利用这些基因构建 SVM 分类模型,采用留一法交叉验证(LOOCV)算法。在训练集(n=196)和验证集(n=197)中,该模型的准确率(分别为 82.1%和 82.2%)和 AUC(分别为 0.87 和 0.91)表明,该建立的最优 SVM 分类模型能够合理地区分结肠癌和直肠癌。然而,还需要进行更多的实验来验证预测的基因表达水平和功能。