Bhuvaneshwar Krithika, Song Lei, Madhavan Subha, Gusev Yuriy
Innovation Center for Biomedical Informatics, Georgetown University, Washington, DC, United States.
Front Microbiol. 2018 Jun 5;9:1172. doi: 10.3389/fmicb.2018.01172. eCollection 2018.
An estimated 17% of cancers worldwide are associated with infectious causes. The extent and biological significance of viral presence/infection in actual tumor samples is generally unknown but could be measured using human transcriptome (RNA-seq) data from tumor samples. We present an open source bioinformatics pipeline viGEN, which allows for not only the detection and quantification of viral RNA, but also variants in the viral transcripts. The pipeline includes 4 major modules: The first module aligns and filter out human RNA sequences; the second module maps and count (remaining un-aligned) reads against reference genomes of all known and sequenced human viruses; the third module quantifies read counts at the individual viral-gene level thus allowing for downstream differential expression analysis of viral genes between case and controls groups. The fourth module calls variants in these viruses. To the best of our knowledge, there are no publicly available pipelines or packages that would provide this type of complete analysis in one open source package. In this paper, we applied the viGEN pipeline to two case studies. We first demonstrate the working of our pipeline on a large public dataset, the TCGA cervical cancer cohort. In the second case study, we performed an in-depth analysis on a small focused study of TCGA liver cancer patients. In the latter cohort, we performed viral-gene quantification, viral-variant extraction and survival analysis. This allowed us to find differentially expressed viral-transcripts and viral-variants between the groups of patients, and connect them to clinical outcome. From our analyses, we show that we were able to successfully detect the human papilloma virus among the TCGA cervical cancer patients. We compared the viGEN pipeline with two metagenomics tools and demonstrate similar sensitivity/specificity. We were also able to quantify viral-transcripts and extract viral-variants using the liver cancer dataset. The results presented corresponded with published literature in terms of rate of detection, and impact of several known variants of HBV genome. This pipeline is generalizable, and can be used to provide novel biological insights into microbial infections in complex diseases and tumorigeneses. Our viral pipeline could be used in conjunction with additional type of immuno-oncology analysis based on RNA-seq data of host RNA for cancer immunology applications. The source code, with example data and tutorial is available at: https://github.com/ICBI/viGEN/.
据估计,全球17%的癌症与感染因素有关。实际肿瘤样本中病毒存在/感染的程度和生物学意义通常未知,但可以使用肿瘤样本的人类转录组(RNA测序)数据进行测量。我们提出了一个开源生物信息学管道viGEN,它不仅可以检测和量化病毒RNA,还可以检测病毒转录本中的变体。该管道包括4个主要模块:第一个模块比对并过滤掉人类RNA序列;第二个模块将(剩余未比对的) reads映射到所有已知和已测序人类病毒的参考基因组上并进行计数;第三个模块在个体病毒基因水平上量化reads计数,从而允许对病例组和对照组之间的病毒基因进行下游差异表达分析。第四个模块调用这些病毒中的变体。据我们所知,没有公开可用的管道或软件包能在一个开源软件包中提供这种类型的完整分析。在本文中,我们将viGEN管道应用于两个案例研究。我们首先在一个大型公共数据集——TCGA宫颈癌队列上展示了我们管道的工作原理。在第二个案例研究中,我们对TCGA肝癌患者的一个小型重点研究进行了深入分析。在后一个队列中,我们进行了病毒基因定量、病毒变体提取和生存分析。这使我们能够找到患者组之间差异表达的病毒转录本和病毒变体,并将它们与临床结果联系起来。从我们的分析中可以看出,我们能够在TCGA宫颈癌患者中成功检测到人乳头瘤病毒。我们将viGEN管道与两种宏基因组学工具进行了比较,并展示了相似的敏感性/特异性。我们还能够使用肝癌数据集量化病毒转录本并提取病毒变体。呈现的结果在检测率以及乙肝病毒基因组几个已知变体的影响方面与已发表的文献一致。这个管道具有通用性,可用于为复杂疾病和肿瘤发生中的微生物感染提供新的生物学见解。我们的病毒管道可与基于宿主RNA的RNA测序数据进行的其他类型免疫肿瘤学分析结合使用,用于癌症免疫学应用。源代码以及示例数据和教程可在以下网址获取:https://github.com/ICBI/viGEN/ 。