Department of Laboratory Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai, China.
ISTBI and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.
J Infect Dis. 2021 Jun 4;223(11):1887-1896. doi: 10.1093/infdis/jiaa647.
Hepatitis B virus (HBV) infection is one of the main leading causes of hepatocellular carcinoma (HCC) worldwide. However, it remains uncertain how the reverse-transcriptase (rt) gene contributes to HCC progression.
We enrolled a total of 307 patients with chronic hepatitis B (CHB) and 237 with HBV-related HCC from 13 medical centers. Sequence features comprised multidimensional attributes of rt nucleic acid and rt/s amino acid sequences. Machine-learning models were used to establish HCC predictive algorithms. Model performances were tested in the training and independent validation cohorts using receiver operating characteristic curves and calibration plots.
A random forest (RF) model based on combined metrics (10 features) demonstrated the best predictive performances in both cross and independent validation (AUC, 0.96; accuracy, 0.90), irrespective of HBV genotypes and sequencing depth. Moreover, HCC risk scores for individuals obtained from the RF model (AUC, 0.966; 95% confidence interval, .922-.989) outperformed α-fetoprotein (0.713; .632-.784) in distinguishing between patients with HCC and those with CHB.
Our study provides evidence for the first time that HBV rt sequences contain vital HBV quasispecies features in predicting HCC. Integrating deep sequencing with feature extraction and machine-learning models benefits the longitudinal surveillance of CHB and HCC risk assessment.
乙型肝炎病毒(HBV)感染是全球肝细胞癌(HCC)的主要致病原因之一。然而,HBV 逆转录酶(rt)基因如何促进 HCC 进展仍不确定。
我们共纳入来自 13 家医疗中心的 307 例慢性乙型肝炎(CHB)患者和 237 例 HBV 相关 HCC 患者。序列特征包括 rt 核酸和 rt/s 氨基酸序列的多维属性。使用机器学习模型建立 HCC 预测算法。使用接收器操作特征曲线和校准图在训练和独立验证队列中测试模型性能。
基于组合指标(10 个特征)的随机森林(RF)模型在交叉和独立验证中均表现出最佳的预测性能(AUC,0.96;准确性,0.90),与 HBV 基因型和测序深度无关。此外,从 RF 模型获得的个体 HCC 风险评分(AUC,0.966;95%置信区间,0.922-0.989)在区分 HCC 患者和 CHB 患者方面优于甲胎蛋白(0.713;0.632-0.784)。
本研究首次提供了证据,证明 HBV rt 序列在预测 HCC 中包含重要的 HBV 准种特征。将深度测序与特征提取和机器学习模型相结合,有利于 CHB 的纵向监测和 HCC 风险评估。