Department of Computer Science & Engineering, University of California San Diego, La Jolla, CA, USA.
Center for Computational Biology and Bioinformatics, University of California San Diego, La Jolla, CA, USA.
Sci Rep. 2022 Mar 24;12(1):5077. doi: 10.1038/s41598-022-09035-w.
Throughout the COVID-19 pandemic, massive sequencing and data sharing efforts enabled the real-time surveillance of novel SARS-CoV-2 strains throughout the world, the results of which provided public health officials with actionable information to prevent the spread of the virus. However, with great sequencing comes great computation, and while cloud computing platforms bring high-performance computing directly into the hands of all who seek it, optimal design and configuration of a cloud compute cluster requires significant system administration expertise. We developed ViReflow, a user-friendly viral consensus sequence reconstruction pipeline enabling rapid analysis of viral sequence datasets leveraging Amazon Web Services (AWS) cloud compute resources and the Reflow system. ViReflow was developed specifically in response to the COVID-19 pandemic, but it is general to any viral pathogen. Importantly, when utilized with sufficient compute resources, ViReflow can trim, map, call variants, and call consensus sequences from amplicon sequence data from 1000 SARS-CoV-2 samples at 1000X depth in < 10 min, with no user intervention. ViReflow's simplicity, flexibility, and scalability make it an ideal tool for viral molecular epidemiological efforts.
在整个 COVID-19 大流行期间,大规模的测序和数据共享工作使全球能够实时监测新型 SARS-CoV-2 株,其结果为公共卫生官员提供了可采取行动的信息,以防止病毒传播。然而,测序的规模越大,计算量就越大,虽然云计算平台将高性能计算直接提供给所有需要的人,但云计算集群的最佳设计和配置需要大量的系统管理专业知识。我们开发了 ViReflow,这是一个用户友好的病毒共识序列重建管道,利用亚马逊网络服务(AWS)云计算资源和 Reflow 系统,可以快速分析病毒序列数据集。ViReflow 是专门针对 COVID-19 大流行开发的,但它适用于任何病毒病原体。重要的是,当使用足够的计算资源时,ViReflow 可以在不到 10 分钟内从 1000 个 SARS-CoV-2 样本的扩增子序列数据中修剪、映射、调用变体,并调用共识序列,深度为 1000X,无需用户干预。ViReflow 的简单性、灵活性和可扩展性使其成为病毒分子流行病学工作的理想工具。