Shen-Gunther Jane, Xia Qingqing, Cai Hong, Wang Yufeng
Gynecologic Oncology & Clinical Investigation, Department of Clinical Investigation, Brooke Army Medical Center, Fort Sam Houston, San Antonio, TX 78234, USA.
Department of Clinical Investigation, Brooke Army Medical Center, Fort Sam Houston, San Antonio, TX 78234, USA.
Pathogens. 2021 Aug 13;10(8):1026. doi: 10.3390/pathogens10081026.
Next-generation sequencing (NGS) has actualized the human papillomavirus (HPV) virome profiling for in-depth investigation of viral evolution and pathogenesis. However, viral computational analysis remains a bottleneck due to semantic discrepancies between computational tools and curated reference genomes. To address this, we developed and tested automated workflows for HPV taxonomic profiling and visualization using a customized papillomavirus database in the CLC Microbial Genomics Module. HPV genomes from Papilloma Virus Episteme were customized and incorporated into CLC "ready-to-use" workflows for stepwise data processing to include: (1) Taxonomic Analysis, (2) Estimate Alpha/Beta Diversities, and (3) Map Reads to Reference. Low-grade ( = 95) and high-grade ( = 60) Pap smears were tested with ensuing collective runtimes: Taxonomic Analysis (36 min); Alpha/Beta Diversities (5 s); Map Reads (45 min). Tabular output conversion to visualizations entailed 1-2 keystrokes. Biodiversity analysis between low- (LSIL) and high-grade squamous intraepithelial lesions (HSIL) revealed loss of species richness and gain of dominance by HPV-16 in HSIL. Integrating clinically relevant, taxonomized HPV reference genomes within automated workflows proved to be an ultra-fast method of virome profiling. The entire process named "HPV DeepSeq" provides a simple, accurate and practical means of NGS data analysis for a broad range of applications in viral research.
下一代测序(NGS)已实现人乳头瘤病毒(HPV)病毒组分析,以深入研究病毒进化和发病机制。然而,由于计算工具与经过整理的参考基因组之间存在语义差异,病毒的计算分析仍然是一个瓶颈。为了解决这一问题,我们使用CLC微生物基因组模块中的定制乳头瘤病毒数据库,开发并测试了用于HPV分类分析和可视化的自动化工作流程。对来自乳头瘤病毒知识库的HPV基因组进行定制,并将其纳入CLC“即用型”工作流程,以进行逐步数据处理,包括:(1)分类分析,(2)估计α/β多样性,以及(3)将 reads 映射到参考序列。对低度(=95)和高度(=60)巴氏涂片进行了测试,随后的总运行时间为:分类分析(36分钟);α/β多样性(5秒);映射 reads(45分钟)。将表格输出转换为可视化只需1-2次按键操作。低度鳞状上皮内病变(LSIL)和高度鳞状上皮内病变(HSIL)之间的生物多样性分析显示,HSIL中物种丰富度降低,HPV-16的优势度增加。在自动化工作流程中整合临床相关的、经过分类的HPV参考基因组被证明是一种超快速的病毒组分析方法。整个过程名为“HPV DeepSeq”,为病毒研究中的广泛应用提供了一种简单、准确且实用的NGS数据分析方法。