Department of Laboratory Medicine, Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, 200438, PR China.
Department of Laboratory Medicine, The 105th Hospital of PLA, Hefei 230031, PR China.
J Gen Virol. 2017 Nov;98(11):2748-2758. doi: 10.1099/jgv.0.000942. Epub 2017 Oct 12.
In order to investigate if deletion patterns of the preS region can predict liver disease advancement, the preS region of the hepatitis B virus (HBV) genome in 45 chronic hepatitis B (CHB) and 94 HBV-related hepatocellular carcinoma (HCC) patients was sequenced by next-generation sequencing (NGS) and the percentages of nucleotide deletion in the preS region were analysed. Hierarchical clustering and heatmaps based on deletion percentages of preS revealed different deletion patterns between CHB and HCC patients. Intergenotype comparison also indicated divergence in preS deletions between HBV genotype B and C. No significant difference was found in preS deletion patterns between sera and matched adjacent non-tumour tissues. Based on hierarchical clustering, HCC patients were classed into two groups with different preS deletion patterns and different clinical features. Finally, the support vector machine (SVM) model was trained on preS nucleotide deletion percentages and used to predict HCC versus CHB patients. The prediction performance was assessed with fivefold cross-validation and independent cohort validation. The median area under the curve (AUC) was 0.729 after repeating SVM 500 times with fivefold cross-validations. After parameter optimization, the SVM model was used to predict an independent cohort with 51 CHB patients and 72 HCC patients and the AUC was 0.727. In conclusion, the use of the NGS method revealed a prominent divergence in preS deletion patterns between disease groups and virus genotypes, but not between different tissue types. Quantitative NGS data combined with a machine learning method could be a powerful approach for prediction of the status of different diseases.
为了探究乙型肝炎病毒(HBV)前 C 区缺失模式是否可以预测肝病进展,本研究采用下一代测序(NGS)技术对 45 例慢性乙型肝炎(CHB)和 94 例 HBV 相关肝细胞癌(HCC)患者的 HBV 基因组前 C 区进行测序,并分析前 C 区核苷酸缺失的百分比。基于前 C 区缺失百分比的层次聚类和热图揭示了 CHB 和 HCC 患者之间不同的缺失模式。不同基因型之间的比较也表明了 HBV 基因型 B 和 C 之间前 C 区缺失的差异。在血清和匹配的相邻非肿瘤组织中未发现前 C 区缺失模式的显著差异。基于层次聚类,将 HCC 患者分为两组,两组具有不同的前 C 区缺失模式和不同的临床特征。最后,使用支持向量机(SVM)模型对前 C 区核苷酸缺失百分比进行训练,并用于预测 HCC 与 CHB 患者。使用五重交叉验证和独立队列验证评估预测性能。重复 SVM 500 次进行五重交叉验证后,中位数曲线下面积(AUC)为 0.729。经过参数优化后,使用 SVM 模型对 51 例 CHB 患者和 72 例 HCC 患者的独立队列进行预测,AUC 为 0.727。总之,使用 NGS 方法揭示了疾病组和病毒基因型之间前 C 区缺失模式的显著差异,但在不同组织类型之间没有差异。定量 NGS 数据结合机器学习方法可能是预测不同疾病状态的有力方法。