Icer Baykal Pelin B, Lara James, Khudyakov Yury, Zelikovsky Alex, Skums Pavel
Department of Computer Science, Georgia State University, 25 Park Place, Atlanta, GA 30302, USA.
Division of Viral Hepatitis, Centers for Disease Control and Prevention, 1600 Clifton Rd., Atlanta, GA 30329, USA.
Virus Evol. 2020 Dec 30;7(1):veaa103. doi: 10.1093/ve/veaa103. eCollection 2021 Jan.
Detection of incident hepatitis C virus (HCV) infections is crucial for identification of outbreaks and development of public health interventions. However, there is no single diagnostic assay for distinguishing recent and persistent HCV infections. HCV exists in each infected host as a heterogeneous population of genomic variants, whose evolutionary dynamics remain incompletely understood. Genetic analysis of such viral populations can be applied to the detection of incident HCV infections and used to understand intra-host viral evolution. We studied intra-host HCV populations sampled using next-generation sequencing from 98 recently and 256 persistently infected individuals. Genetic structure of the populations was evaluated using 245,878 viral sequences from these individuals and a set of selected features measuring their diversity, topological structure, complexity, strength of selection, epistasis, evolutionary dynamics, and physico-chemical properties. Distributions of the viral population features differ significantly between recent and persistent infections. A general increase in viral genetic diversity from recent to persistent infections is frequently accompanied by decline in genomic complexity and increase in structuredness of the HCV population, likely reflecting a high level of intra-host adaptation at later stages of infection. Using these findings, we developed a machine learning classifier for the infection staging, which yielded a detection accuracy of 95.22 per cent, thus providing a higher accuracy than other genomic-based models. The detection of a strong association between several HCV genetic factors and stages of infection suggests that intra-host HCV population develops in a complex but regular and predictable manner in the course of infection. The proposed models may serve as a foundation of cyber-molecular assays for staging infection, which could potentially complement and/or substitute standard laboratory assays.
检测新发丙型肝炎病毒(HCV)感染对于识别疫情爆发以及制定公共卫生干预措施至关重要。然而,目前尚无单一的诊断检测方法可区分近期和持续性HCV感染。HCV在每个受感染宿主中以基因组变异的异质群体形式存在,其进化动态仍未完全了解。对这类病毒群体进行基因分析可用于检测新发HCV感染,并有助于了解宿主内病毒的进化。我们研究了通过下一代测序从98名近期感染和256名持续感染个体中采集的宿主内HCV群体。利用这些个体的245,878条病毒序列以及一组用于测量其多样性、拓扑结构、复杂性、选择强度、上位性、进化动态和物理化学性质的选定特征,对群体的遗传结构进行了评估。近期感染和持续感染之间,病毒群体特征的分布存在显著差异。从近期感染到持续感染,病毒遗传多样性普遍增加,同时HCV群体的基因组复杂性下降、结构化程度增加,这可能反映了感染后期宿主内高度的适应性。基于这些发现,我们开发了一种用于感染分期的机器学习分类器,其检测准确率为95.22%,因此比其他基于基因组的模型具有更高的准确率。检测到几种HCV遗传因素与感染阶段之间存在强关联,这表明宿主内HCV群体在感染过程中以复杂但规律且可预测的方式发展。所提出的模型可作为用于感染分期的网络分子检测的基础,有可能补充和/或替代标准实验室检测。