Enagbonma Ben Jesuorsemwen, Amoo Adenike Eunice, Babalola Olubukola Oluranti
Food Security and Safety Niche, Faculty of Natural and Agricultural Sciences, North-West University, Private Mail Bag X2046, Mmabatho, 2735, South Africa.
Data Brief. 2019 Nov 13;28:104802. doi: 10.1016/j.dib.2019.104802. eCollection 2020 Feb.
We present the metagenomic dataset of the microbial DNA of a termite mound in the North West Province of South Africa. This is the foremost account revealing the microbial diversity of a termite mound soil using the shotgun metagenomics approach in the Province. Next-generation sequencing of the community DNA was carried out on an Illumina Miseq platform. The metagenome comprised of 7,270,818 sequences representing 1,172,099,467 bps with a mean length of 161 bps and 52% G + C content. The sequence data is accessible at the NCBI SRA under the bioproject number PRJNA526912. Metagenomic Rapid Annotations using Subsystems Technology (MG-RAST) was employed for community analysis and it was observed that 0.36% sequences were of archeal origin, 9.51% were eukaryotes and 90.01% were fit to bacteria. A total of 5 archeal, 27 bacterial, and 22 eukaryotic phyla were revealed. Abundant genera were (6.00%), (5.00%), (4.00%), (3.00%), and (3.00%), representing 19.23% in the metagenome. For functional examination, Cluster-of-Orthologous-Group (COG) based annotation showed that 46.44% sequences were metabolism associated and 17.45% grouped in the poorly characterized category. Subsystem based annotation method indicated that 14.00% sequences were carbohydrates, 13.00% were clustering-based subsystems, and 10.00% genes for amino acids and derivatives together with the presence of useful traits needed in the body of science.
我们展示了南非西北省一个白蚁丘微生物DNA的宏基因组数据集。这是该省首次使用鸟枪法宏基因组学方法揭示白蚁丘土壤微生物多样性的报告。在Illumina Miseq平台上对群落DNA进行了下一代测序。宏基因组由7,270,818个序列组成,代表1,172,099,467个碱基对,平均长度为161个碱基对,G + C含量为52%。序列数据可在NCBI SRA上通过生物项目编号PRJNA526912获取。使用子系统技术的宏基因组快速注释(MG-RAST)用于群落分析,结果发现0.36%的序列来自古菌,9.51%是真核生物,90.01%属于细菌。总共揭示了5个古菌门、27个细菌门和22个真核生物门。丰富的属包括(6.00%)、(5.00%)、(4.00%)、(3.00%)和(3.00%),在宏基因组中占19.23%。对于功能检查,基于直系同源簇(COG)的注释显示46.44%的序列与代谢相关,17.45%归类于特征不明确的类别。基于子系统的注释方法表明14.00%的序列是碳水化合物,13.00%是基于聚类的子系统,10.00%的基因是氨基酸及其衍生物,同时还存在科学领域所需的有用特征。