Javadzadeh Sara, Rajkumar Utkrisht, Nguyen Nam, Sarmashghi Shahab, Luebeck Jens, Shang Jingbo, Bafna Vineet
Department of Computer Science & Engineering, UC San Diego, La Jolla, California, USA.
Boundless Bio, Inc. 11099 N Torrey Pines Rd, La Jolla, CA, USA.
NAR Genom Bioinform. 2022 Apr 26;4(2):lqac032. doi: 10.1093/nargab/lqac032. eCollection 2022 Jun.
DNA viruses are important infectious agents known to mediate a large number of human diseases, including cancer. Viral integration into the host genome and the formation of hybrid transcripts are also associated with increased pathogenicity. The high variability of viral genomes, however requires the use of sensitive ensemble hidden Markov models that add to the computational complexity, often requiring > 40 CPU-hours per sample. Here, we describe FastViFi, a fast 2-stage filtering method that reduces the computational burden. On simulated and cancer genomic data, FastViFi improved the running time by 2 orders of magnitude with comparable accuracy on challenging data sets. Recently published methods have focused on identification of location of viral integration into the human host genome using local assembly, but do not extend to RNA. To identify human viral hybrid transcripts, we additionally developed ensemble Hidden Markov Models for the Epstein Barr virus (EBV) to add to the models for Hepatitis B (HBV), Hepatitis C (HCV) viruses and the Human Papillomavirus (HPV), and used FastViFi to query RNA-seq data from Gastric cancer (EBV) and liver cancer (HBV/HCV). FastViFi ran in <10 minutes per sample and identified multiple hybrids that fuse viral and human genes suggesting new mechanisms for oncoviral pathogenicity. FastViFi is available at https://github.com/sara-javadzadeh/FastViFi.
DNA病毒是已知的重要感染因子,可引发包括癌症在内的多种人类疾病。病毒整合到宿主基因组以及形成嵌合转录本也与致病性增加有关。然而,病毒基因组的高度变异性需要使用敏感的整体隐马尔可夫模型,这增加了计算复杂性,通常每个样本需要超过40个CPU小时。在此,我们描述了FastViFi,一种快速的两阶段过滤方法,可减轻计算负担。在模拟和癌症基因组数据上,FastViFi在具有挑战性的数据集上以可比的准确性将运行时间提高了2个数量级。最近发表的方法主要集中在使用局部组装来识别病毒整合到人类宿主基因组中的位置,但未扩展到RNA。为了识别人类病毒嵌合转录本,我们额外开发了针对爱泼斯坦-巴尔病毒(EBV)的整体隐马尔可夫模型,以补充乙型肝炎病毒(HBV)、丙型肝炎病毒(HCV)和人乳头瘤病毒(HPV)的模型,并使用FastViFi查询来自胃癌(EBV)和肝癌(HBV/HCV)的RNA测序数据。FastViFi每个样本运行时间不到10分钟,并识别出多个融合病毒和人类基因的嵌合体,提示了肿瘤病毒致病的新机制。FastViFi可在https://github.com/sara-javadzadeh/FastViFi获取。