Stephens Zachary, O'Brien Daniel, Dehankar Mrunal, Roberts Lewis R, Iyer Ravishankar K, Kocher Jean-Pierre
Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL, United States of America.
Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States of America.
PLoS One. 2021 Sep 22;16(9):e0250915. doi: 10.1371/journal.pone.0250915. eCollection 2021.
The integration of viruses into the human genome is known to be associated with tumorigenesis in many cancers, but the accurate detection of integration breakpoints from short read sequencing data is made difficult by human-viral homologies, viral genome heterogeneity, coverage limitations, and other factors. To address this, we present Exogene, a sensitive and efficient workflow for detecting viral integrations from paired-end next generation sequencing data. Exogene's read filtering and breakpoint detection strategies yield integration coordinates that are highly concordant with long read validation. We demonstrate this concordance across 6 TCGA Hepatocellular carcinoma (HCC) tumor samples, identifying integrations of hepatitis B virus that are also supported by long reads. Additionally, we applied Exogene to targeted capture data from 426 previously studied HCC samples, achieving 98.9% concordance with existing methods and identifying 238 high-confidence integrations that were not previously reported. Exogene is applicable to multiple types of paired-end sequence data, including genome, exome, RNA-Seq and targeted capture.
已知病毒整合到人类基因组中与许多癌症的肿瘤发生有关,但由于人类与病毒的同源性、病毒基因组的异质性、覆盖范围的限制以及其他因素,从短读长测序数据中准确检测整合断点变得困难。为了解决这个问题,我们提出了Exogene,这是一种用于从双端新一代测序数据中检测病毒整合的灵敏且高效的工作流程。Exogene的读段过滤和断点检测策略产生的整合坐标与长读长验证高度一致。我们在6个TCGA肝细胞癌(HCC)肿瘤样本中证明了这种一致性,鉴定出了也得到长读长支持的乙型肝炎病毒整合。此外,我们将Exogene应用于来自426个先前研究过的HCC样本的靶向捕获数据,与现有方法的一致性达到98.9%,并鉴定出238个先前未报道的高置信度整合。Exogene适用于多种类型的双端序列数据,包括基因组、外显子组、RNA测序和靶向捕获。