Department of Oncology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
Medical Centre on Ageing of Ruijin Hospital, MCARJH, Shanghai Jiaotong University School of Medicine, Shanghai, China.
BJS Open. 2023 May 5;7(3). doi: 10.1093/bjsopen/zrad031.
The aim of this study was to construct a predictive signature integrating tumour-mutation- and copy-number-variation-associated features using machine learning to precisely predict early relapse and survival in patients with resected stage I-II pancreatic ductal adenocarcinoma.
Patients with microscopically confirmed stage I-II pancreatic ductal adenocarcinoma undergoing R0 resection at the Chinese PLA General Hospital between March 2015 and December 2016 were enrolled. Whole exosome sequencing was performed, and genes with different mutation or copy number variation statuses between patients with and without relapse within 1 year were identified using bioinformatics analysis. A support vector machine was used to evaluate the importance of the differential gene features and to develop a signature. Signature validation was performed in an independent cohort. The associations of the support vector machine signature and single gene features with disease-free survival and overall survival were assessed. Biological functions of integrated genes were further analysed.
Overall, 30 and 40 patients were included in the training and validation cohorts, respectively. Some 11 genes with differential patterns were first identified; using a support vector machine, four features (mutations of DNAH9, TP53, and TUBGCP6, and copy number variation of TMEM132E) were further selected and integrated to construct a predictive signature (the support vector machine classifier). In the training cohort, the 1-year disease-free survival rates were 88 per cent (95 per cent c.i. 73 to 100) and 7 per cent (95 per cent c.i. 1 to 47) in the low-support vector machine subgroup and the high-support vector machine subgroup respectively (P < 0.001). Multivariable analyses showed that high support vector machine was significantly and independently associated with both worse overall survival (HR 29.20 (95 per cent c.i. 4.48 to 190.21); P < 0.001) and disease-free survival (HR 72.04 (95 per cent c.i. 6.74 to 769.96); P < 0.001). The area under the curve of the support vector machine signature for 1-year disease-free survival (0.900) was significantly larger than the area under the curve values of the mutations of DNAH9 (0.733; P = 0.039), TP53 (0.767; P = 0.024), and TUBGCP6 (0.733; P = 0.023), the copy number variation of TMEM132E (0.700; P = 0.014), TNM stage (0.567; P = 0.002), and differentiation grade (0.633; P = 0.005), suggesting higher predictive accuracy for prognosis. The value of the signature was further validated in the validation cohort. The four genes included in the support vector machine signature (DNAH9, TUBGCP6, and TMEM132E were novel in pancreatic ductal adenocarcinoma) were significantly associated with the tumour immune microenvironment, G protein-coupled receptor binding and signalling, cell-cell adhesion, etc.
The newly constructed support vector machine signature precisely and powerfully predicted relapse and survival in patients with stage I-II pancreatic ductal adenocarcinoma after R0 resection.
本研究旨在构建一个整合肿瘤突变和拷贝数变异相关特征的预测模型,利用机器学习精确预测接受 R0 切除的 I 期-II 期胰腺导管腺癌患者的早期复发和生存情况。
本研究纳入 2015 年 3 月至 2016 年 12 月在中国人民解放军总医院接受 R0 切除的经显微镜证实的 I 期-II 期胰腺导管腺癌患者。进行全外显子组测序,使用生物信息学分析鉴定患者中与 1 年内复发相关的具有不同突变或拷贝数变异状态的基因。使用支持向量机评估差异基因特征的重要性,并开发特征签名。在独立队列中进行特征签名验证。评估支持向量机特征和单基因特征与无病生存和总生存的相关性。进一步分析整合基因的生物学功能。
研究共纳入 30 例和 40 例患者分别进入训练和验证队列。首先鉴定出 11 个具有不同模式的基因;使用支持向量机进一步选择并整合四个特征(DNAH9、TP53 和 TUBGCP6 的突变和 TMEM132E 的拷贝数变异)构建预测特征(支持向量机分类器)。在训练队列中,低支持向量机亚组和高支持向量机亚组的 1 年无病生存率分别为 88%(95%可信区间 73 至 100)和 7%(95%可信区间 1 至 47)(P < 0.001)。多变量分析显示,高支持向量机与总生存(HR 29.20(95%可信区间 4.48 至 190.21);P < 0.001)和无病生存(HR 72.04(95%可信区间 6.74 至 769.96);P < 0.001)均显著且独立相关。支持向量机特征用于预测 1 年无病生存的曲线下面积(AUC)为 0.900,显著大于 DNAH9(0.733;P = 0.039)、TP53(0.767;P = 0.024)、TUBGCP6(0.733;P = 0.023)、TMEM132E(0.700;P = 0.014)、TNM 分期(0.567;P = 0.002)和分化等级(0.633;P = 0.005)的 AUC 值,提示对预后有更高的预测准确性。该特征在验证队列中进一步得到验证。支持向量机特征中包含的四个基因(DNAH9、TUBGCP6 和 TMEM132E)在胰腺导管腺癌中是新发现的,与肿瘤免疫微环境、G 蛋白偶联受体结合和信号转导、细胞-细胞黏附等有关。
新构建的支持向量机特征精确且有力地预测了接受 R0 切除的 I 期-II 期胰腺导管腺癌患者的复发和生存情况。