Anuntasomboon Pornchai, Siripattanapipong Suradej, Unajak Sasimanas, Choowongkomon Kiattawee, Burchmore Richard, Leelayoova Saovanee, Mungthin Mathirut, E-Kobon Teerasak
Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand.
Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok 10900, Thailand.
Biology (Basel). 2022 Aug 26;11(9):1272. doi: 10.3390/biology11091272.
(formerly named ) has been neglected for years in Thailand. The genomic study of has gained much attention recently after the release of the first high-quality reference genome of the isolate LSCM4. The integrative approach of multiple sequencing platforms for whole-genome sequencing has proven effective at the expense of considerably expensive costs. This study presents a preliminary bioinformatic workflow including the use of multi-step de novo assembly coupled with the reference-based assembly method to produce high-quality genomic drafts from the short-read Illumina sequence data of isolate PCM2.
The integrating multi-step de novo assembly by MEGAHIT and SPAdes with the reference-based method using the genome and salvaging the unmapped reads resulted in the 30.27 Mb genomic draft of isolate PCM2 with 3367 contigs and 8887 predicted genes. The results from the integrated approach showed the best integrity, coverage, and contig alignment when compared to the genome of isolate LSCM4 collected from the northern province of Thailand. Similar patterns of gene ratios and frequency were observed from the GO biological process annotation. Fifty GO terms were assigned to the assembled genomes, and 23 of these (accounting for 61.6% of the annotated genes) showed higher gene counts and ratios when results from our workflow were compared to those of the LSCM4 isolate.
These results indicated that our proposed bioinformatic workflow produced an acceptable-quality genome of strain PCM2 for functional genomic analysis, maximising the usage of the short-read data. This workflow would give extensive information required for identifying strain-specific markers and virulence-associated genes useful for drug and vaccine development before a more exhaustive and expensive investigation.
(原名 )在泰国多年来一直被忽视。在分离株LSCM4的首个高质量参考基因组发布后, 的基因组研究最近受到了广泛关注。多种测序平台用于全基因组测序的综合方法已被证明是有效的,但代价是成本相当高昂。本研究提出了一种初步的生物信息学工作流程,包括使用多步从头组装结合基于参考的组装方法,从 分离株PCM2的短读长Illumina序列数据中生成高质量的基因组草图。
通过MEGAHIT和SPAdes进行的多步从头组装与使用 基因组的基于参考的方法相结合,并挽救未映射的读段,得到了 分离株PCM2的30.27 Mb基因组草图,包含3367个重叠群和8887个预测基因。与从泰国北部省份收集的 分离株LSCM4的基因组相比,综合方法的结果显示出最佳的完整性、覆盖率和重叠群比对。从GO生物学过程注释中观察到了相似的基因比例和频率模式。50个GO术语被分配到组装的基因组中,当将我们工作流程的结果与LSCM4分离株的结果进行比较时,其中23个(占注释基因的61.6%)显示出更高的基因计数和比例。
这些结果表明,我们提出的生物信息学工作流程为功能基因组分析产生了质量可接受的 菌株PCM2基因组,最大限度地利用了短读长数据。在进行更详尽和昂贵的研究之前,该工作流程将提供识别菌株特异性标记和与毒力相关基因所需的广泛信息,这些信息对药物和疫苗开发有用。