Chong Sin Yee, Azmi Aida Azrina, Cheah Yoke Kqueen
Unit of Molecular Biology and Bioinformatics, Department of Biomedical Science, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor Darul Ehsan, Malaysia.
Halal Science Research, Halal Products Research Institute, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor Darul Ehsan, Malaysia.
Data Brief. 2023 Oct 12;51:109657. doi: 10.1016/j.dib.2023.109657. eCollection 2023 Dec.
39 is a rare actinobacteria strain isolated from the less explored extreme environment of the Antarctic soil. Here, we present the whole genome sequencing and annotation data from the high-quality draft genome of from Antarctica. The extracted genomic deoxyribonucleic acid (DNA) was sequenced using the PacBio Sequel sequencing platform, followed by the Illumina HiSeq sequencing system. Subsequently, the assembly data from Canu 1.7 and Pilon were subjected to bioinformatics analysis for genome annotation to analyze the entire genomic information of the sequences. Different bioinformatics analysis approaches were used to disclose a high-quality draft genome basis for and provided a better understanding of its biological and molecular functions. Note that 83,639 reads were predicted from its 3.6Mb genome size, with a guanine-cytosine content (GC) content of 72.39%. The genome was assembled into two contigs, where the larger contig represents the chromosome and the smaller contig represents the plasmid. It is composed of 3,381 coding genes, with about 95% of them being functionally annotated. It consists of 3,318 coding sequences, one tmRNA gene, 57 tRNA genes, and five repeated regions. was evident, sharing a close sequence similarity with the species and the family . Gene Ontology (GO) functional classification indicated cell and cell parts were highly represented among the cellular component category; catalytic activity and binding were the most enriched processes within the molecular function category; metabolic and cellular processes were the most represented in the biological process category. Clusters of Orthologous Group (COG) functional classification revealed metabolism-related genes were highly enriched and mostly mapped to amino acid transport metabolism, transcription, energy production, and conversion. Moreover, the Kyoto Encyclopedia of Genes and Genomes (KEGG) functional classification reported that the metabolism process was the most represented KEGG pathway. There were 52 biosynthetic gene clusters involved in secondary metabolites biosynthesis, indicating has antibacterial, antifungal, cytotoxic, and inhibitor bioactivities. The dataset of the whole-genome sequence of has been deposited in the European Nucleotide Archive (ENA) repository under the accession number PRJEB44986 / ERP129097. The dataset of the genome annotation of had been deposited in Zenodo. The reported genomic sequence data for contributes comprehensive data to the current molecular information of the species, serving as a significant approach that facilitates the advancement of medicine.
39是从探索较少的南极土壤极端环境中分离出的一种罕见放线菌菌株。在此,我们展示了来自南极洲该菌株高质量草图基因组的全基因组测序和注释数据。提取的基因组脱氧核糖核酸(DNA)使用PacBio Sequel测序平台进行测序,随后使用Illumina HiSeq测序系统。随后,对来自Canu 1.7和Pilon的组装数据进行生物信息学分析以进行基因组注释,从而分析序列的整个基因组信息。使用了不同的生物信息学分析方法来揭示该菌株高质量草图基因组的基础,并更好地了解其生物学和分子功能。请注意,从其3.6Mb的基因组大小预测出83,639条 reads,鸟嘌呤 - 胞嘧啶含量(GC)为72.39%。基因组被组装成两个重叠群,其中较大的重叠群代表染色体,较小的重叠群代表质粒。它由3381个编码基因组成,其中约95%在功能上已注释。它由3318个编码序列、一个tmRNA基因、57个tRNA基因和五个重复区域组成。该菌株很明显,与[具体物种]和[具体科]的物种具有密切的序列相似性。基因本体论(GO)功能分类表明,在细胞成分类别中,细胞和细胞部分的代表性很高;在分子功能类别中,催化活性和结合是最丰富的过程;在生物过程类别中,代谢和细胞过程的代表性最高。直系同源群(COG)功能分类显示,与代谢相关的基因高度富集,主要映射到氨基酸转运代谢、转录、能量产生和转换。此外,京都基因与基因组百科全书(KEGG)功能分类报告称,代谢过程是最具代表性的KEGG途径。有52个生物合成基因簇参与次生代谢物生物合成,表明该菌株具有抗菌、抗真菌、细胞毒性和抑制剂生物活性。该菌株全基因组序列数据集已存入欧洲核苷酸档案库(ENA),登录号为PRJEB44986 / ERP129097。该菌株基因组注释数据集已存入Zenodo。所报告的该菌株基因组序列数据为该物种当前的分子信息提供了全面的数据,是促进医学进步的重要途径。