Department of Hepatobiliary and Pancreatic Surgery, General Surgery Center, First Hospital of Jilin University, Changchun, China.
J Gene Med. 2024 Sep;26(9):e3732. doi: 10.1002/jgm.3732.
This study aims to develop and validate machine learning-based diagnostic and prognostic models to predict the risk of distant lymph node metastases (DLNM) in patients with hepatocellular carcinoma (HCC) and to evaluate the prognosis for this cohort.
Utilizing a retrospective design, this investigation leverages data extracted from the Surveillance, Epidemiology, and End Results (SEER) database, specifically the January 2024 subset, to conduct the analysis.
The study cohort consists of 15,775 patients diagnosed with HCC as identified within the SEER database, spanning 2016 to 2020.
In the construction of the diagnostic model, recursive feature elimination (RFE) is employed for variable selection, incorporating five critical predictors: age, tumor size, radiation therapy, T-stage, and serum alpha-fetoprotein (AFP) levels. These variables are the foundation for a stacking ensemble model, which is further elucidated through Shapley Additive Explanations (SHAP). Conversely, the prognostic model is crafted utilizing stepwise backward regression to select pertinent variables, including chemotherapy, radiation therapy, tumor size, and age. This model culminates in the development of a prognostic nomogram, underpinned by the Cox proportional hazards model.
The outcome of the diagnostic model is the occurrence of DLNM in patients. The outcome of the prognosis model is determined by survival time and survival status.
The integrated model developed based on stacking demonstrates good predictive performance and high interpretative variability and differentiation. The area under the curve (AUC) in the training set is 0.767, while the AUC in the validation set is 0.768. The nomogram, constructed using the Cox model, also demonstrates consistent and strong predictive capabilities. At the same time, we recognized elements that have a substantial impact on DLNM and the prognosis and extensively discussed their significance in the model and clinical practice.
Our study identified key predictive factors for DLNM and elucidated significant prognostic indicators for HCC patients with DLNM. These findings provide clinicians with valuable tools to accurately identify high-risk individuals for DLNM and conduct more precise risk stratification for this patient subgroup, potentially improving management strategies and patient outcomes.
本研究旨在开发和验证基于机器学习的诊断和预后模型,以预测肝细胞癌(HCC)患者远处淋巴结转移(DLNM)的风险,并评估该队列的预后。
本研究采用回顾性设计,利用从监测、流行病学和最终结果(SEER)数据库中提取的数据(截至 2024 年 1 月的子集)进行分析。
研究队列包括 15775 名在 SEER 数据库中诊断为 HCC 的患者,时间范围为 2016 年至 2020 年。
在构建诊断模型时,递归特征消除(RFE)用于变量选择,纳入五个关键预测因子:年龄、肿瘤大小、放射治疗、T 期和血清甲胎蛋白(AFP)水平。这些变量是堆叠集成模型的基础,进一步通过 Shapley 加性解释(SHAP)进行阐明。相反,预后模型是通过逐步向后回归选择相关变量来构建的,包括化疗、放射治疗、肿瘤大小和年龄。该模型最终生成基于 Cox 比例风险模型的预后列线图。
诊断模型的结果是患者发生 DLNM。预后模型的结果由生存时间和生存状态决定。
基于堆叠的集成模型表现出良好的预测性能和较高的解释性变异性和区分度。训练集的曲线下面积(AUC)为 0.767,验证集的 AUC 为 0.768。使用 Cox 模型构建的列线图也表现出一致且强大的预测能力。同时,我们确定了对 DLNM 和预后有重大影响的因素,并在模型和临床实践中广泛讨论了它们的意义。
本研究确定了 DLNM 的关键预测因素,并阐明了 DLNM 肝细胞癌患者的重要预后指标。这些发现为临床医生提供了有价值的工具,可以准确识别 DLNM 高危个体,并对该患者亚组进行更精确的风险分层,从而可能改善管理策略和患者预后。