Feng Gong, He Na, Xia Harry Hua-Xiang, Mi Man, Wang Ke, Byrne Christopher D, Targher Giovanni, Yuan Hai-Yang, Zhang Xin-Lei, Zheng Ming-Hua, Ye Feng
The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
The First Affiliated Hospital of Xi'an Medical University, Xi'an, China.
J Gastroenterol Hepatol. 2022 Nov;37(11):2145-2153. doi: 10.1111/jgh.15940. Epub 2022 Jul 17.
Over 10% of hepatocellular carcinoma (HCC) cases recur each year, even after surgical resection. Currently, there is a lack of knowledge about the causes of recurrence and the effective prevention. Prediction of HCC recurrence requires diagnostic markers endowed with high sensitivity and specificity. This study aims to identify new key proteins for HCC recurrence and to build machine learning algorithms for predicting HCC recurrence.
The proteomics data for analysis in this study were obtained from the Clinical Proteomics Tumor Analysis Consortium (CPTAC) database. We analyzed different proteins based on cases with or without recurrence of HCC. Survival analysis, Cox regression analysis, and area under the ROC curves (AUROC > 0.7) were used to screen for more significant differential proteins. Predictive models for HCC recurrence were developed using four machine learning algorithms.
A total of 690 differentially expressed proteins between 50 relapsed and 77 non-relapsed hepatitis B-related HCC patients were identified. Seven of these proteins had an AUROC > 0.7 for 5-year survival in HCC, including BAHCC1, ESF1, RAP1GAP, RUFY1, SCAMP3, STK3, and TMEM230. Among the machine learning algorithms, the random forest algorithm showed the highest AUROC values (AUROC: 0.991, 95% CI 0.962-0.999) for identifying HCC recurrence, followed by the support vector machine (AUROC: 0.893, 95% Cl 0.824-0.956), the logistic regression (AUROC: 0.774, 95% Cl 0.672-0.868), and the multi-layer perceptron algorithm (AUROC: 0.571, 95% Cl 0.459-0.682).
Our study identifies seven novel proteins for predicting HCC recurrence and the random forest algorithm as the most suitable predictive model for HCC recurrence.
每年超过10%的肝细胞癌(HCC)病例会复发,即使在手术切除后也是如此。目前,对于复发原因及有效预防措施尚缺乏了解。预测HCC复发需要具有高灵敏度和特异性的诊断标志物。本研究旨在识别HCC复发的新关键蛋白,并构建用于预测HCC复发的机器学习算法。
本研究中用于分析的蛋白质组学数据来自临床蛋白质组肿瘤分析联盟(CPTAC)数据库。我们根据HCC有无复发情况分析不同的蛋白质。采用生存分析、Cox回归分析和ROC曲线下面积(AUROC>0.7)来筛选更显著的差异蛋白。使用四种机器学习算法开发HCC复发的预测模型。
在50例复发和77例未复发的乙型肝炎相关HCC患者之间共鉴定出690种差异表达蛋白。其中7种蛋白在HCC患者5年生存中的AUROC>0.7,包括BAHCC1、ESF1、RAP1GAP、RUFY1、SCAMP3、STK3和TMEM230。在机器学习算法中,随机森林算法在识别HCC复发方面显示出最高的AUROC值(AUROC:0.991,95%CI 0.962 - 0.999),其次是支持向量机(AUROC:0.893,95%Cl 0.824 - 0.956)、逻辑回归(AUROC:0.774,95%Cl 0.672 - 0.868)和多层感知器算法(AUROC:0.571,95%Cl 0.459 - 0.682)。
我们的研究识别出7种预测HCC复发的新蛋白,且随机森林算法是最适合HCC复发的预测模型。