Ha Yeonjung, Lee Seungseok, Lim Jihye, Lee Kwanjoo, Chon Young Eun, Lee Joo Ho, Lee Kwan Sik, Kim Kang Mo, Shim Ju Hyun, Lee Danbi, Yon Dong Keon, Lee Jinseok, Lee Han Chu
Department of Gastroenterology, CHA Bundang Medical Center, CHA University, Seongnam-si, Gyeonggi-do, South Korea.
Department of Biomedical Engineering, College of Electronics and Informatics, Kyung Hee University, Yongin-si, Gyeonggi-do, South Korea.
Liver Int. 2025 Apr;45(4):e16139. doi: 10.1111/liv.16139. Epub 2024 Dec 18.
This study aims to develop and validate a machine learning (ML) model predicting hepatocellular carcinoma (HCC) in chronic hepatitis B (CHB) patients after the first 5 years of entecavir (ETV) or tenofovir (TFV) therapy.
CHB patients treated with ETV/TFV for > 5 years and not diagnosed with HCC during the first 5 years of therapy were selected from two hospitals. We used 36 variables, including baseline characteristics (age, sex, cirrhosis, and type of antiviral agent) and laboratory values (at baseline, at 5 years, and changes between 5 years) for model development. Five machine learning algorithms were applied to the training dataset and internally validated using a test dataset. External validation was performed.
In years 5-15, a total of 279/5908 (4.7%) and 25/562 (4.5%) patients developed HCC in the derivation and external validation cohorts, respectively. In the training dataset (n = 4726), logistic regression showed the highest area under the receiver operating curve (AUC) of 0.803 and a balanced accuracy of 0.735, outperforming other ML algorithms. An ensemble model combining logistic regression and random forest performed best (AUC, 0.811 and balanced accuracy, 0.754). The results from the test dataset (n = 1182) verified the good performance of the ensemble model (AUC, 0.784 and balanced accuracy, 0.712). External validation confirmed the predictive accuracy of our ensemble model (AUC, 0.862 and balanced accuracy, 0.771). A web-based calculator was developed (http://ai-wm.khu.ac.kr/HCC/).
The proposed ML model excellently predicted HCC risk beyond year 5 of ETV/TFV therapy and, therefore, could facilitate individualised HCC surveillance based on risk stratification.
本研究旨在开发并验证一种机器学习(ML)模型,用于预测接受恩替卡韦(ETV)或替诺福韦(TFV)治疗5年后的慢性乙型肝炎(CHB)患者发生肝细胞癌(HCC)的风险。
从两家医院选取接受ETV/TFV治疗超过5年且在治疗的前5年未被诊断为HCC的CHB患者。我们使用36个变量进行模型开发,包括基线特征(年龄、性别、肝硬化和抗病毒药物类型)以及实验室值(基线时、5年时以及5年期间的变化)。将五种机器学习算法应用于训练数据集,并使用测试数据集进行内部验证。进行了外部验证。
在第5至15年,推导队列和外部验证队列中分别有279/5908(4.7%)和25/562(4.5%)的患者发生HCC。在训练数据集(n = 4726)中,逻辑回归显示受试者工作特征曲线下面积(AUC)最高,为0.803,平衡准确率为0.735,优于其他ML算法。结合逻辑回归和随机森林的集成模型表现最佳(AUC为0.811,平衡准确率为0.754)。测试数据集(n = 1182)的结果验证了集成模型的良好性能(AUC为0.784,平衡准确率为0.712)。外部验证证实了我们集成模型的预测准确性(AUC为0.862,平衡准确率为0.771)。开发了一个基于网络的计算器(http://ai-wm.khu.ac.kr/HCC/)。
所提出的ML模型能够出色地预测ETV/TFV治疗5年后的HCC风险,因此有助于基于风险分层进行个体化的HCC监测。