Duchen Dylan, Clipman Steven, Vergara Candelaria, Thio Chloe L, Thomas David L, Duggal Priya, Wojcik Genevieve L
Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA.
Division of Infectious Diseases, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA.
bioRxiv. 2023 Jan 12:2023.01.11.523611. doi: 10.1101/2023.01.11.523611.
Hepatitis B virus (HBV) remains a global public health concern, with over 250 million individuals living with chronic HBV infection (CHB) and no curative therapy currently available. Viral diversity is associated with CHB pathogenesis and immunological control of infection. Improved methods to characterize the viral genome at both the population and intra-host level could aid drug development efforts. Conventionally, HBV sequencing data are aligned to a linear reference genome and only sequences capable of aligning to the reference are captured for analysis. Reference selection has additional consequences, including sample-specific 'consensus' sequence construction. It remains unclear how to select a reference from available sequences and whether a single reference is sufficient for genetic analyses. Using simulated short-read sequencing data generated from full-length publicly available HBV genome sequences and HBV sequencing data from a longitudinally sampled individual with CHB, we investigate alternative graph-based alignment approaches. We demonstrate that using a phylogenetically representative 'genome graph' for alignment, rather than linear reference sequences, avoids issues of reference ambiguity, improves alignment, and facilitates the construction of sample-specific consensus sequences genetically similar to an individual's infection. Graph-based methods can therefore improve efforts to characterize the genetics of viral pathogens, including HBV, and may have broad implications in host pathogen research.
乙型肝炎病毒(HBV)仍然是一个全球公共卫生问题,超过2.5亿人患有慢性HBV感染(CHB),目前尚无治愈性疗法。病毒多样性与CHB发病机制及感染的免疫控制相关。在群体和宿主内水平上表征病毒基因组的改进方法有助于药物研发工作。传统上,HBV测序数据与线性参考基因组比对,仅捕获能够与参考序列比对的序列用于分析。参考序列的选择还有其他影响,包括构建样本特异性的“共识”序列。目前尚不清楚如何从可用序列中选择参考序列,以及单一参考序列是否足以进行遗传分析。我们使用从公开可用的全长HBV基因组序列生成的模拟短读长测序数据以及来自一名纵向采样的CHB个体的HBV测序数据,研究了基于图谱的替代比对方法。我们证明,使用系统发育代表性的“基因组图谱”进行比对,而不是线性参考序列,可以避免参考序列模糊性问题,改善比对效果,并有助于构建与个体感染在遗传上相似的样本特异性共识序列。因此,基于图谱的方法可以改进对包括HBV在内的病毒病原体遗传学特征的研究工作,并且可能在宿主病原体研究中具有广泛影响。