Medical Big Data Center, the Second Affiliated Hospital of Nanchang University, Nanchang, PR China.
Department of Gastroenterology, the Second Affiliated Hospital of Nanchang University, Nanchang, PR China.
Ann Hepatol. 2024 Nov-Dec;29(6):101540. doi: 10.1016/j.aohep.2024.101540. Epub 2024 Aug 15.
The increasing incidence of hepatocellular carcinoma (HCC) in China is an urgent issue, necessitating early diagnosis and treatment. This study aimed to develop personalized predictive models by combining machine learning (ML) technology with a demographic, medical history, and noninvasive biomarker data. These models can enhance the decision-making capabilities of physicians for HCC in hepatitis B virus (HBV)-related cirrhosis patients with low serum alpha-fetoprotein (AFP) levels.
A total of 6,980 patients treated between January 2012 and December 2018 were included. Pre-treatment laboratory tests and clinical data were obtained. The significant risk factors for HCC were identified, and the relative risk of each variable affecting its diagnosis was calculated using ML and univariate regression analysis. The data set was then randomly partitioned into validation (20 %) and training sets (80 %) to develop the ML models.
Twelve independent risk factors for HCC were identified using Gaussian naïve Bayes, extreme gradient boosting (XGBoost), random forest, and least absolute shrinkage and selection operation regression models. Multivariate analysis revealed that male sex, age >60 years, alkaline phosphate >150 U/L, AFP >25 ng/mL, carcinoembryonic antigen >5 ng/mL, and fibrinogen >4 g/L were the risk factors, whereas hypertension, calcium <2.25 mmol/L, potassium ≤3.5 mmol/L, direct bilirubin >6.8 μmol/L, hemoglobin <110 g/L, and glutamic-pyruvic transaminase >40 U/L were the protective factors in HCC patients. Based on these factors, a nomogram was constructed, showing an area under the curve (AUC) of 0.746 (sensitivity = 0.710, specificity=0.646), which was significantly higher than AFP AUC of 0.658 (sensitivity = 0.462, specificity=0.766). Compared with several ML algorithms, the XGBoost model had an AUC of 0.832 (sensitivity = 0.745, specificity=0.766) and an independent validation AUC of 0.829 (sensitivity = 0.766, specificity = 0.737), making it the top-performing model in both sets. The external validation results have proven the accuracy of the XGBoost model.
The proposed XGBoost demonstrated a promising ability for individualized prediction of HCC in HBV-related cirrhosis patients with low-level AFP.
中国肝细胞癌(HCC)发病率的上升是一个紧迫的问题,需要早期诊断和治疗。本研究旨在通过结合机器学习(ML)技术和人口统计学、病史以及非侵入性生物标志物数据来建立个性化预测模型。这些模型可以增强医生对乙型肝炎病毒(HBV)相关肝硬化患者低血清甲胎蛋白(AFP)水平的 HCC 的决策能力。
共纳入 2012 年 1 月至 2018 年 12 月期间治疗的 6980 例患者。获取治疗前的实验室检查和临床数据。确定 HCC 的显著危险因素,并使用 ML 和单变量回归分析计算影响其诊断的每个变量的相对风险。然后将数据集随机分为验证(20%)和训练集(80%),以开发 ML 模型。
使用高斯朴素贝叶斯、极端梯度提升(XGBoost)、随机森林和最小绝对收缩和选择操作回归模型确定了 12 个 HCC 的独立危险因素。多变量分析显示,男性、年龄>60 岁、碱性磷酸酶>150 U/L、AFP>25ng/mL、癌胚抗原>5ng/mL 和纤维蛋白原>4g/L 是危险因素,而高血压、钙<2.25mmol/L、钾≤3.5mmol/L、直接胆红素>6.8μmol/L、血红蛋白<110g/L 和谷草转氨酶>40U/L 是保护因素。基于这些因素,构建了一个列线图,显示曲线下面积(AUC)为 0.746(敏感性=0.710,特异性=0.646),明显高于 AFP AUC 的 0.658(敏感性=0.462,特异性=0.766)。与几种 ML 算法相比,XGBoost 模型的 AUC 为 0.832(敏感性=0.745,特异性=0.766),独立验证 AUC 为 0.829(敏感性=0.766,特异性=0.737),是两个数据集性能最佳的模型。外部验证结果证明了 XGBoost 模型的准确性。
所提出的 XGBoost 对低 AFP 水平的 HBV 相关肝硬化患者 HCC 的个体化预测具有良好的预测能力。