Suppr超能文献

基于机器学习算法构建随机生存森林模型以预测成人肝细胞癌肝切除术后的早期复发。

Construction of a random survival forest model based on a machine learning algorithm to predict early recurrence after hepatectomy for adult hepatocellular carcinoma.

作者信息

Zhang Ji, Chen Qing, Zhang Yu, Zhou Jie

机构信息

Department of Hepatobiliary Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.

Department of Biochemistry and Molecular Biology, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.

出版信息

BMC Cancer. 2024 Dec 25;24(1):1575. doi: 10.1186/s12885-024-13366-4.

Abstract

BACKGROUND AND AIMS

Hepatocellular carcinoma (HCC) exhibits a propensity for early recurrence following liver resection, resulting in a bleak prognosis. At present, majority of the predictive models for the early postoperative recurrence of HCC rely on the linear assumption of the Cox Proportional Hazard (CPH) model. However, the predictive efficacy of this model is constrained by the intricate nature of clinical data. The present study aims to investigate the efficacy of the random survival forest (RSF) model, which is a machine learning algorithm, in predicting the early postoperative recurrence of HCC, and compare its performance with that of the traditional CPH model. This analysis seeks to elucidate the potential advantages of the RSF model over the CPH model in addressing this clinical challenge.

METHODS

The present retrospective cohort study was conducted at a single center. After excluding 41 patients, a total of 541 patients were included in the final model construction and subsequent analysis. The patients were randomly divided into two groups at a 7:3 ratio: training group (n = 378) and validation group (n = 163). The least absolute shrinkage and selection operator (LASSO) regression was used to identify the risk factors in the training group. Then, the identified factors were used to develop the RSF and CPH regression models. The predictive ability of the model was assessed using the concordance index (C-index). The accuracy of the model predictions was evaluated using the receiver operating characteristic curve (ROC) and area under the receiver operating characteristic curve (AUC). The clinical practicality of the model was measured by decision curve analysis (DCA), and the overall performance of the model was evaluated using the Brier score. The RSF model was visually represented using the Shapley additive explanations (SHAP) framework. Then, the RSF, CPH regression, and albumin-bilirubin (ALBI) grade models were compared.

RESULTS

The following variables were examined by LASSO regression: alpha fetoprotein (AFP), gamma-glutamyl transpeptidase to platelet ratio (GPR), blood transfusion (BT), microvascular invasion (MVI), large vessel invasion (LVI), Edmondson-Steiner (ES) grade, liver capsule invasion (LCI), satellite nodule (SN), and Barcelona clinic liver cancer (BCLC) grade. Then, a RSF model was developed using 500 trees, and the variable importance (VIMP) ranking was MVI, LCI, SN, BT, BCLC, ESG, AFP, GPR and LVI. After these aforementioned factors were applied, the RSF and CPH regression models were developed and compared using the ALBI grade model. The C-index for the RSF model (0.896 and 0.798, respectively) outperformed that of the CPH regression model (0.803 and 0.772, respectively) and ALBI grade model (0.517 and 0.515, respectively), in both the training and validation groups. Three time points were selected to assess the predictive capabilities of these models: 6, 12 and 18 months. For the training group, the AUC value for the RSF model at 6, 12 and 18 months was 0.971 (95% CI: 0.955-0.988), 0.919 (95% CI: 0.887-0.951) and 0.899 (95% CI: 0.867-0.932), respectively. For the validation cohort, the AUC value for the RSF model at 6, 12 and 18 months was 0.830 (95% CI: 0.728-0.932), 0.856 (95% CI: 0.787-0.924) and 0.832 (95% CI: 0.764-0.901), respectively. The AUC values were higher in the RSF model, when compared to the CPH regression model and ALBI grade model, in both groups. The DCA results revealed that the net clinical benefits associated to the RSF model were superior to those associated to the CPH regression model and ALBI grade model in both groups, suggesting a higher level of clinical utility in the RSF model. The Brier score for the RSF model at 6, 12 and 18 months was 0.062, 0.125 and 0.178, respectively, in the training group, and 0.111, 0.128 and 0.149, respectively, in the validation group. In summary, the RSF model demonstrated superior performance, when compared to the CPH regression model and ALBI grade model. Furthermore, the RSF model demonstrated superior predictive ability, accuracy, clinical practicality, and overall performance, when compared to the CPH regression model and ALBI grade model. In addition, the RSF model was able to successfully stratify patients into three distinct risk groups (low-risk, medium-risk and high-risk) in both groups (p < 0.001).

CONCLUSIONS

The RSF model demonstrates efficacy in predicting early recurrence following HCC surgery, exhibiting superior performance, when compared to the CPH regression model and ALBI grade model. For patients undergoing HCC surgery, the RSF model can serve as a valuable tool for clinicians to postoperatively stratify patients into distinct risk categories, offering guidance for subsequent follow-up care.

摘要

背景与目的

肝细胞癌(HCC)在肝切除术后具有早期复发的倾向,导致预后不佳。目前,大多数用于预测HCC术后早期复发的模型依赖于Cox比例风险(CPH)模型的线性假设。然而,该模型的预测效果受到临床数据复杂性的限制。本研究旨在探讨随机生存森林(RSF)模型(一种机器学习算法)在预测HCC术后早期复发方面的效果,并将其性能与传统的CPH模型进行比较。该分析旨在阐明RSF模型相对于CPH模型在应对这一临床挑战方面的潜在优势。

方法

本回顾性队列研究在单一中心进行。排除41例患者后,共有541例患者纳入最终模型构建及后续分析。患者按7:3比例随机分为两组:训练组(n = 378)和验证组(n = 163)。采用最小绝对收缩和选择算子(LASSO)回归确定训练组中的危险因素。然后,将识别出的因素用于构建RSF和CPH回归模型。使用一致性指数(C指数)评估模型的预测能力。使用受试者工作特征曲线(ROC)和受试者工作特征曲线下面积(AUC)评估模型预测的准确性。通过决策曲线分析(DCA)衡量模型的临床实用性,并使用Brier评分评估模型的整体性能。使用Shapley加性解释(SHAP)框架直观呈现RSF模型。然后,比较RSF、CPH回归和白蛋白-胆红素(ALBI)分级模型。

结果

通过LASSO回归检验了以下变量:甲胎蛋白(AFP)、γ-谷氨酰转肽酶与血小板比值(GPR)、输血(BT)、微血管侵犯(MVI)、大血管侵犯(LVI)、Edmondson-Steiner(ES)分级、肝包膜侵犯(LCI)、卫星结节(SN)和巴塞罗那临床肝癌(BCLC)分级。然后,使用500棵树构建了RSF模型,变量重要性(VIMP)排名为MVI、LCI、SN、BT、BCLC、ESG、AFP、GPR和LVI。应用上述因素后,构建了RSF和CPH回归模型,并与ALBI分级模型进行比较。在训练组和验证组中,RSF模型的C指数(分别为0.896和0.798)均优于CPH回归模型(分别为0.803和0.772)和ALBI分级模型(分别为0.517和0.515)。选择三个时间点评估这些模型的预测能力:6个月、12个月和18个月。对于训练组,RSF模型在6个月、12个月和18个月时的AUC值分别为0.971(95%CI:0.955 - 0.988)、0.919(95%CI:0.887 - 0.951)和0.899(95%CI:0.867 - 0.932)。对于验证队列,RSF模型在6个月、12个月和18个月时的AUC值分别为0.830(95%CI:0.728 - 0.932)、0.856(95%CI:0.787 - 0.924)和0.832(95%CI:0.76 = 4 - 0.901)。在两组中,与CPH回归模型和ALBI分级模型相比,RSF模型的AUC值更高。DCA结果显示,两组中与RSF模型相关的净临床效益均优于与CPH回归模型和ALBI分级模型相关的净临床效益,表明RSF模型具有更高的临床实用性。训练组中RSF模型在6个月、12个月和18个月时的Brier评分分别为0.062、0.125和0.178,验证组中分别为0.111、0.128和0.149。总之,与CPH回归模型和ALBI分级模型相比,RSF模型表现出更优的性能。此外,与CPH回归模型和ALBI分级模型相比,RSF模型在预测能力、准确性、临床实用性和整体性能方面均表现更优。此外,RSF模型能够在两组中成功地将患者分为三个不同的风险组(低风险、中风险和高风险)(p < 0.001)。

结论

RSF模型在预测HCC手术后早期复发方面显示出有效性,与CPH回归模型和ALBI分级模型相比表现出更优的性能。对于接受HCC手术的患者,RSF模型可以作为临床医生术后将患者分层为不同风险类别的有价值工具,为后续的随访护理提供指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e97/11670344/94d86ef617f5/12885_2024_13366_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验