Department of Laboratory Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai, China.
Department of Laboratory Medicine, Mengchao Hepatobiliary Hospital of Fujian Medical University, Fuzhou.
Int J Cancer. 2021 Aug 1;149(3):717-727. doi: 10.1002/ijc.33564. Epub 2021 Mar 26.
Alpha-fetoprotein (AFP)-negative hepatocellular carcinoma (ANHCC) patients account for more than 30% of the whole entity of HCC patients and are easily misdiagnosed. This three-phase study was designed to find and validate new ANHCC N-glycan markers which identified from The Cancer Genome Atlas (TCGA) database and noninvasive detection. Differentially expressed genes (DEGs) of N-glycan biosynthesis and degradation related genes were screened from TCGA database. Serum N-glycan structure abundances were analyzed using N-glycan fingerprint (NGFP) technology. Totally 1340 participants including ANHCC, chronic liver diseases and healthy controls were enrolled after propensity score matching (PSM). The Lasso algorithm was used to select the most significant N-glycan structures abundances. Three machine learning models [random forest (RF), support vector machine (SVM) and logistic regression (LR)] were used to construct the diagnostic algorithms. All 13N-glycan structure abundances analyzed by NGFP demonstrated significant and was enrolled by Lasso. Among the three machine learning models, LR algorithm demonstrated the best diagnostic performance for identifying ANHCC in training cohort (LR: AUC: 0.842, 95%CI: 0.784-0.899; RF: AUC: 0.825, 95%CI: 0.766-0.885; SVM: AUC: 0.610, 95%CI: 0.527-0.684). This LR algorithm achieved a high diagnostic performance again in the independent validation (AUC: 0.860, 95%CI: 0.824-0.897). Furthermore, the LR algorithm could stratify ANHCC into two distinct subgroups with high or low risks of overall survival and recurrence in follow-up validation. In conclusion, the biomarker panel consisting of 13N-glycan structures abundances using the best-performing algorithm (LR) was defined and indicative as an effective tool for HCC prediction and prognosis estimate in AFP negative subjects.
甲胎蛋白阴性肝细胞癌(ANHCC)患者占 HCC 患者总数的 30%以上,且易误诊。本研究采用三阶段设计,旨在从癌症基因组图谱(TCGA)数据库中寻找和验证新的 ANHCC N-糖基化标记物,并进行无创检测。从 TCGA 数据库中筛选出与 N-糖基化生物合成和降解相关的差异表达基因(DEGs)。采用 N-糖基化指纹(NGFP)技术分析血清 N-糖基化结构丰度。通过倾向性评分匹配(PSM),共纳入包括 ANHCC、慢性肝病和健康对照组在内的 1340 名参与者。采用 Lasso 算法筛选出最显著的 N-糖基化结构丰度。采用随机森林(RF)、支持向量机(SVM)和逻辑回归(LR)三种机器学习模型构建诊断算法。所有通过 NGFP 分析的 13 种 N-糖基化结构丰度均具有显著差异,并被 Lasso 筛选出。在三种机器学习模型中,LR 算法在训练队列中对识别 ANHCC 的诊断性能最佳(LR:AUC:0.842,95%CI:0.784-0.899;RF:AUC:0.825,95%CI:0.766-0.885;SVM:AUC:0.610,95%CI:0.527-0.684)。该 LR 算法在独立验证中再次取得了较高的诊断性能(AUC:0.860,95%CI:0.824-0.897)。此外,LR 算法在随访验证中可以将 ANHCC 分为总体生存和复发风险高低两个不同亚组。总之,由使用最佳算法(LR)的 13 种 N-糖基化结构丰度组成的生物标志物谱被定义为 AFP 阴性患者 HCC 预测和预后估计的有效工具。