Centre for Bioinformatics and Computational Biology, Dep. Biochemistry, University of Pretoria, Lynnwood Rd, Hillcrest, Pretoria, 0002, South Africa.
Biotechnology Platform, Agricultural Research Council, Onderstepoort, South Africa.
BMC Bioinformatics. 2018 Aug 30;19(1):309. doi: 10.1186/s12859-018-2320-1.
Metagenomic approaches have revealed the complexity of environmental microbiomes with the advancement in whole genome sequencing displaying a significant level of genetic heterogeneity on the species level. It has become apparent that patterns of superior bioactivity of bacteria applicable in biotechnology as well as the enhanced virulence of pathogens often requires distinguishing between closely related species or sub-species. Current methods for binning of metagenomic reads usually do not allow for identification below the genus level and generally stops at the family level.
In this work, an attempt was made to improve metagenomic binning resolution by creating genome specific barcodes based on the core and accessory genomes. This protocol was implemented in novel software tools available for use and download from http://bargene.bi.up.ac.za /. The most abundant barcode genes from the core genomes were found to encode for ribosomal proteins, certain central metabolic genes and ABC transporters. Performance of metabarcode sequences created by this package was evaluated using artificially generated and publically available metagenomic datasets. Furthermore, a program (Barcoding 2.0) was developed to align reads against barcode sequences and thereafter calculate various parameters to score the alignments and the individual barcodes. Taxonomic units were identified in metagenomic samples by comparison of the calculated barcode scores to set cut-off values. In this study, it was found that varying sample sizes, i.e. number of reads in a metagenome and metabarcode lengths, had no significant effect on the sensitivity and specificity of the algorithm. Receiver operating characteristics (ROC) curves were calculated for different taxonomic groups based on the results of identification of the corresponding genomes in artificial metagenomic datasets. The reliability of distinguishing between species of the same genus or family by the program was nearly perfect.
The results showed that the novel online tool BarcodeGenerator ( http://bargene.bi.up.ac.za /) is an efficient approach for generating barcode sequences from a set of complete genomes provided by users. Another program, Barcoder 2.0 is available from the same resource to enable an efficient and practical use of metabarcodes for visualization of the distribution of organisms of interest in environmental and clinical samples.
随着全基因组测序技术的进步,宏基因组方法揭示了环境微生物组的复杂性,显示出物种水平上显著的遗传异质性。显然,具有生物技术应用优势的细菌模式和病原体增强的毒力通常需要区分密切相关的物种或亚种。目前用于宏基因组读段分类的方法通常不允许在属以下水平进行鉴定,通常只能停留在科水平。
在这项工作中,尝试通过基于核心和辅助基因组创建基因组特异性条形码来提高宏基因组分类的分辨率。该方案在可从 http://bargene.bi.up.ac.za/ 使用和下载的新软件工具中实施。从核心基因组中发现最丰富的条形码基因编码核糖体蛋白、某些中心代谢基因和 ABC 转运蛋白。使用人工生成和公开可用的宏基因组数据集评估了此软件包创建的代谢条形码序列的性能。此外,开发了一个程序(Barcoding 2.0),用于将读取序列与条形码序列对齐,然后计算各种参数来评分对齐和各个条形码。通过将计算的条形码得分与设定的截止值进行比较,在宏基因组样本中识别分类单元。在这项研究中,发现不同的样本大小(即宏基因组中的读取数量和代谢条形码的长度)对算法的敏感性和特异性没有显著影响。基于人工宏基因组数据集中对应基因组识别结果,计算了不同分类群的接收者操作特性(ROC)曲线。该程序区分同一属或科物种的可靠性几乎是完美的。
结果表明,新型在线工具 BarcodeGenerator(http://bargene.bi.up.ac.za/)是一种从用户提供的一组完整基因组中生成条形码序列的有效方法。还可以从同一资源获得另一个程序 Barcoder 2.0,以有效地将代谢条形码用于可视化环境和临床样本中感兴趣的生物体的分布。