Suppr超能文献

用于预测肝细胞癌患者5年总生存率的小样本机器学习模型的开发与验证

Development and validation of a small-sample machine learning model to predict 5-year overall survival in patients with hepatocellular carcinoma.

作者信息

Jiang Tingting, Liu Xingyu, He Wencan, Li Hepei, Yan Xiang, Yu Qian, Mao Shanjun

机构信息

Division of Head & Neck Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, Chengdu, Sichuan Province, 610041, China.

Laboratory of Tumor Therapy and Immunology Research, West China Hospital, Sichuan University, Chengdu, Sichuan Province, 610041, China.

出版信息

BMC Cancer. 2025 Jul 1;25(1):1040. doi: 10.1186/s12885-025-14425-0.

Abstract

BACKGROUND

Early-onset hepatocellular carcinoma (HCC) is insidious, with characteristics of easy metastasis, high recurrence rate, and significant mortality. To address the substantial time and resource demands associated with HCC prognostic prediction, we extract meaningful insights from limited small-sample data to develop and validate a prediction model for HCC 5-year overall survival (OS) by machine learning (ML).

METHODS

76 newly diagnosed patients with HCC were eventually enrolled between September 2018 and July 2019. The follow-up time was 1-67 months. Patients who survived for 5 years after the first surgery, were divided into a surviving group (n = 34) and a nonsurviving group (n = 42). Pathological data and related survival factors of patients were collected before treatment. The final subset of features was filtered. Prediction models for 5-year OS in patients with HCC were established by logistic regression (LR), support vector machine (SVM), decision tree classification (DTC), random forests (RF), and extreme gradient Boosting (XGBoost), respectively. Additionally, the optimal model was established after rigorous validation. The models were evaluated by values of specificity, F1 score, recall, accuracy and area under the receiver operating characteristic curve (AUC-ROC). The decision curve analysis (DCA) method was assessed the evaluation. Finally, internal and external validations were performed to further validate model' robustness.

RESULTS

The significant variable set, which included 22 variables, was screened. Ranking the importance of variables, the top 22 characteristic variables were as follows: the maximum diameter, presence or absence of distant metastasis, CNLC stage, ALB, age, RBC, the large size CTC, total bilirubin (TBIL), PD-L1 (-) CTC, ≥ Pentaploid CTC, AFP, vascular cancer thrombus and satellite nodules, WBC, CTC, BCLC stage, multiple nodules, AST, PD-L1 (-) CTC-WBC cluster, Triploid CTC, LYM, PD-L1 (-) CEC-WBC cluster and degree of cirrhosis. The AUC-ROC values for predicting the 5-year OS rate of HCC patients by the logistic regression, SVM, DTC, RF, and XGBoost models were 0.737, 0.971, 0.657, 0.741, and 0.703, respectively. Among them, the SVM model had the best performance (Accuracy = 0.987, F1 score = 0.988, Recall value = 1.000). The SVM algorithm demonstrated superior performance and stability in the internal and external validations of the model.

CONCLUSION

The SVM model could predict the 5-year OS in HCC with good recognition ability and achieves significantly greater accuracy compared to traditional models. Diagnosis and treatment could be utilized to intervene in the risk factors in this model, thereby improving patient prognosis.

摘要

背景

早发性肝细胞癌(HCC)隐匿性强,具有易转移、复发率高和死亡率高的特点。为满足HCC预后预测所涉及的大量时间和资源需求,我们从有限的小样本数据中提取有意义的见解,以通过机器学习(ML)开发并验证HCC 5年总生存期(OS)的预测模型。

方法

2018年9月至2019年7月最终纳入76例新诊断的HCC患者。随访时间为1 - 67个月。首次手术后存活5年的患者分为存活组(n = 34)和非存活组(n = 42)。收集患者治疗前的病理数据和相关生存因素。对最终的特征子集进行筛选。分别通过逻辑回归(LR)、支持向量机(SVM)、决策树分类(DTC)、随机森林(RF)和极端梯度提升(XGBoost)建立HCC患者5年OS的预测模型。此外,经过严格验证后建立最优模型。通过特异性、F1分数、召回率、准确率和受试者操作特征曲线下面积(AUC - ROC)值对模型进行评估。采用决策曲线分析(DCA)方法进行评估。最后,进行内部和外部验证以进一步验证模型的稳健性。

结果

筛选出包含22个变量的显著变量集。对变量重要性进行排序,前22个特征变量如下:最大直径、有无远处转移、CNLC分期、白蛋白(ALB)、年龄、红细胞(RBC)、大尺寸循环肿瘤细胞(CTC)、总胆红素(TBIL)、程序性死亡受体配体1(PD - L1)阴性CTC、≥五倍体CTC、甲胎蛋白(AFP)、血管癌栓和卫星结节、白细胞(WBC)、CTC、巴塞罗那临床肝癌(BCLC)分期、多发结节、谷草转氨酶(AST)、PD - L1阴性CTC - WBC簇、三倍体CTC、淋巴细胞(LYM)、PD - L1阴性循环内皮细胞(CEC) - WBC簇和肝硬化程度。逻辑回归、SVM、DTC、RF和XGBoost模型预测HCC患者5年OS率的AUC - ROC值分别为0.737、0.971、0.657、0.741和0.703。其中,SVM模型表现最佳(准确率 = 0.987,F1分数 = 0.988,召回值 = 1.000)。SVM算法在模型的内部和外部验证中表现出卓越的性能和稳定性。

结论

SVM模型能够较好地预测HCC患者的5年OS,与传统模型相比具有显著更高的准确率。可利用该模型中的诊断和治疗方法干预危险因素,从而改善患者预后。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/187d/12211751/1203d13513df/12885_2025_14425_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验