Babalola Olubukola Oluranti, Adedayo Afeez Adesina, Akinola Saheed Adekunle
Food Security and Safety Focus Area, Faculty of Natural and Agricultural Sciences, North-West University, Private Mail Bag X2046, Mmabatho 2735, South Africa.
Department of Microbiology and Parasitology, School of Medicine and Pharmacy, College of Medicine and Health Sciences, University of Rwanda, Butare, Rwanda.
Data Brief. 2024 Apr 15;54:110381. doi: 10.1016/j.dib.2024.110381. eCollection 2024 Jun.
Microorganisms inhabiting caves exhibit medical or biotechnological promise, most of which have been attributed to factors such as antimicrobial activity or the induction of mineral precipitation. This dataset explored the shotgun metagenomic sequencing of the Cango cave microbial community in Oudtshoorn, South Africa. The aimed to elucidate both the structure and function of the microbial community linked to the cave. DNA sequencing was conducted using the Illumina NovaSeq platform, a next-generation sequencing. The data comprises 4,738,604 sequences, with a cumulative size of 1,180,744,252 base pairs and a GC content of 52%. Data derived from the metagenome sequences can be accessed through the bioproject number PRJNA982691 on NCBI. Using an online metagenome server, MG-RAST, the subsystem database revealed that bacteria displayed the highest taxonomical representation, constituting about 98.66%. Archaea accounted for 0.05%, Eukaryotes at 1.20%, viruses were 0.07%, while unclassified sequences had a representation of 0.02%. The most abundant phyla were (81.74%), (10.57%), (4.16%), (SK‒1.03%), (0.20), and (SK‒0.16%). Functional annotation using subsystem analysis revealed that clustering based on subsystems had 13.44%, while amino acids and derivatives comprised 11.41%. Carbohydrates sequences constituted 9.55%, along with other advantageous functional traits essential for growth promotion and plant management.
栖息在洞穴中的微生物具有医学或生物技术应用前景,其中大部分归因于抗菌活性或矿物沉淀诱导等因素。该数据集探索了南非奥茨胡恩坎戈洞穴微生物群落的鸟枪法宏基因组测序。其目的是阐明与该洞穴相关的微生物群落的结构和功能。使用Illumina NovaSeq平台进行DNA测序,这是一种新一代测序技术。数据包含4,738,604个序列,累积大小为1,180,744,252个碱基对,GC含量为52%。源自宏基因组序列的数据可通过NCBI上的生物项目编号PRJNA982691获取。使用在线宏基因组服务器MG-RAST,子系统数据库显示细菌的分类学代表性最高,约占98.66%。古菌占0.05%,真核生物占1.20%,病毒占0.07%,而未分类序列占0.02%。最丰富的门是(81.74%)、(10.57%)、(4.16%)、(SK - 1.03%)、(0.20)和(SK - 0.16%)。使用子系统分析进行功能注释显示,基于子系统的聚类占13.44%,而氨基酸及其衍生物占11.41%。碳水化合物序列占9.55%,以及其他对促进生长和植物管理至关重要的有利功能特征。