Singla Deepak, Yadav Inderjit Singh
School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, India.
Curr Genomics. 2022 Jun 10;23(2):77-82. doi: 10.2174/1389202923666220128155537.
Next-generation sequencing (NGS) technologies are being continuously used for high-throughput sequencing data generation that requires easy-to-use GUI-based data analysis software. These kinds of software could be used in-parallel with sequencing for the automatic data analysis. At present, very few software are available for use and most of them are commercial, thus creating a gap between data generation and data analysis. GAAP is developed on the NodeJS platform that uses HTML, JavaScript as the front-end for communication with users. We have implemented FastQC and trimmomatic tool for quality checking and control. Velvet and Prodigal are integrated for genome assembly and gene prediction. The annotation will be done with the help of remote NCBI Blast and IPR-Scan. In the back- end, we have used PERL and JavaScript for the processing of data. To evaluate the performance of GAAP, we have assembled a viral (SRR11621811), bacterial (SRR17153353) and human genome (SRR16845439). We have used GAAP software to assemble, and annotate a COVID-19 genome on a desktop computer that resulted in a single contig of 27994bp with 99.57% reference genome coverage. This assembly predicted 11 genes, of which 10 were annotated using annotation module of GAAP. We have also assembled a bacterial and human genome 138 and 194281 contigs with N50 value 100399 and 610, respectively. In this study, we have developed freely available, platform-independent genome assembly and annotation (GAAP) software (www.deepaklab.com/gaap). The software itself acts as a complete data analysis package with quality check, quality control, genome assembly, gene prediction and annotation (Blast, PFAM, GO-Term, pathway and enzyme mapping) modules.
新一代测序(NGS)技术正不断用于生成高通量测序数据,这需要基于图形用户界面(GUI)的易于使用的数据分析软件。这类软件可与测序并行使用,以进行自动数据分析。目前,可用的软件非常少,而且大多数都是商业软件, 因此在数据生成和数据分析之间形成了差距。GAAP是在NodeJS平台上开发的,它使用HTML、JavaScript作为与用户通信的前端。我们已经实现了用于质量检查和控制的FastQC和trimmomatic工具。整合了Velvet和Prodigal用于基因组组装和基因预测。注释将借助远程NCBI Blast和IPR-Scan完成。在后端,我们使用PERL和JavaScript进行数据处理。为了评估GAAP的性能,我们组装了一个病毒基因组(SRR11621811)、一个细菌基因组(SRR17153353)和一个人类基因组(SRR16845439)。我们使用GAAP软件在台式计算机上组装并注释了一个新冠病毒基因组,得到了一个长度为27994bp的单重叠群,参考基因组覆盖率为99.57%。该组装预测了11个基因,其中10个使用GAAP的注释模块进行了注释。我们还分别组装了一个细菌基因组和一个人类基因组,分别得到了138和194281个重叠群,N50值分别为100399和610。在本研究中,我们开发了免费的、与平台无关的基因组组装和注释(GAAP)软件(www.deepaklab.com/gaap)。该软件本身作为一个完整的数据分析包,包含质量检查、质量控制、基因组组装、基因预测和注释(Blast、PFAM、GO术语、通路和酶映射)模块。