Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen, Germany.
Institute of Medical Microbiology, Justus Liebig University Giessen, Giessen, Germany.
PLoS Comput Biol. 2020 Mar 5;16(3):e1007134. doi: 10.1371/journal.pcbi.1007134. eCollection 2020 Mar.
Whole genome sequencing of bacteria has become daily routine in many fields. Advances in DNA sequencing technologies and continuously dropping costs have resulted in a tremendous increase in the amounts of available sequence data. However, comprehensive in-depth analysis of the resulting data remains an arduous and time-consuming task. In order to keep pace with these promising but challenging developments and to transform raw data into valuable information, standardized analyses and scalable software tools are needed. Here, we introduce ASA3P, a fully automatic, locally executable and scalable assembly, annotation and analysis pipeline for bacterial genomes. The pipeline automatically executes necessary data processing steps, i.e. quality clipping and assembly of raw sequencing reads, scaffolding of contigs and annotation of the resulting genome sequences. Furthermore, ASA3P conducts comprehensive genome characterizations and analyses, e.g. taxonomic classification, detection of antibiotic resistance genes and identification of virulence factors. All results are presented via an HTML5 user interface providing aggregated information, interactive visualizations and access to intermediate results in standard bioinformatics file formats. We distribute ASA3P in two versions: a locally executable Docker container for small-to-medium-scale projects and an OpenStack based cloud computing version able to automatically create and manage self-scaling compute clusters. Thus, automatic and standardized analysis of hundreds of bacterial genomes becomes feasible within hours. The software and further information is available at: asap.computational.bio.
细菌全基因组测序在许多领域已成为常规操作。DNA 测序技术的进步和成本的持续降低,使得可用的序列数据量大幅增加。然而,对这些产生的数据进行全面深入的分析仍然是一项艰巨且耗时的任务。为了跟上这些充满希望但具有挑战性的发展步伐,并将原始数据转化为有价值的信息,需要标准化的分析和可扩展的软件工具。在这里,我们介绍了 ASA3P,这是一个用于细菌基因组的全自动、本地执行和可扩展的组装、注释和分析管道。该管道可自动执行必要的数据处理步骤,例如对原始测序reads 进行质量裁剪和组装、拼接 contigs 以及注释生成的基因组序列。此外,ASA3P 还进行全面的基因组特征分析和检测,例如分类学分类、抗生素耐药基因的检测和毒力因子的鉴定。所有结果都通过 HTML5 用户界面呈现,提供聚合信息、交互式可视化和访问标准生物信息学文件格式的中间结果。我们以两种版本分发 ASA3P:一个用于小型到中型项目的本地可执行 Docker 容器,以及一个基于 OpenStack 的云计算版本,能够自动创建和管理自扩展计算集群。因此,在数小时内即可实现数百个细菌基因组的自动和标准化分析。该软件及更多信息可在 asap.computational.bio 上获取。