Chair for Clinical Bioinformatics, Saarland University, Campus Building E2.1, 66123 Saarbrücken, Germany.
Nucleic Acids Res. 2017 Jul 3;45(W1):W171-W179. doi: 10.1093/nar/gkx348.
Metagenomics-based studies of mixed microbial communities are impacting biotechnology, life sciences and medicine. Computational binning of metagenomic data is a powerful approach for the culture-independent recovery of population-resolved genomic sequences, i.e. from individual or closely related, constituent microorganisms. Existing binning solutions often require a priori characterized reference genomes and/or dedicated compute resources. Extending currently available reference-independent binning tools, we developed the BusyBee Web server for the automated deconvolution of metagenomic data into population-level genomic bins using assembled contigs (Illumina) or long reads (Pacific Biosciences, Oxford Nanopore Technologies). A reversible compression step as well as bootstrapped supervised binning enable quick turnaround times. The binning results are represented in interactive 2D scatterplots. Moreover, bin quality estimates, taxonomic annotations and annotations of antibiotic resistance genes are computed and visualized. Ground truth-based benchmarks of BusyBee Web demonstrate comparably high performance to state-of-the-art binning solutions for assembled contigs and markedly improved performance for long reads (median F1 scores: 70.02-95.21%). Furthermore, the applicability to real-world metagenomic datasets is shown. In conclusion, our reference-independent approach automatically bins assembled contigs or long reads, exhibits high sensitivity and precision, enables intuitive inspection of the results, and only requires FASTA-formatted input. The web-based application is freely accessible at: https://ccb-microbe.cs.uni-saarland.de/busybee.
基于宏基因组学的混合微生物群落研究正在影响生物技术、生命科学和医学。宏基因组数据分析的计算分箱是一种强大的方法,可以在无需培养的情况下恢复群体分辨率的基因组序列,即从单个或密切相关的组成微生物中恢复。现有的分箱解决方案通常需要先验特征化的参考基因组和/或专用计算资源。为了扩展现有的无参考的分箱工具,我们开发了 BusyBee Web 服务器,用于使用组装的 contigs(Illumina)或长 reads(Pacific Biosciences、Oxford Nanopore Technologies)将宏基因组数据自动分解为群体水平的基因组 bin。可逆压缩步骤和引导监督分箱可实现快速周转时间。分箱结果以交互式 2D 散点图表示。此外,还计算并可视化了 bin 质量估计、分类注释和抗生素抗性基因注释。基于真实数据的 BusyBee Web 基准测试表明,与组装 contigs 的最新分箱解决方案相比,其性能相当高,而与长 reads 的性能则有明显提高(中位数 F1 分数:70.02-95.21%)。此外,还展示了其对真实宏基因组数据集的适用性。总之,我们的无参考方法可以自动分箱组装 contigs 或长 reads,具有高灵敏度和精度,能够直观地检查结果,并且仅需要 FASTA 格式的输入。基于网络的应用程序可免费访问:https://ccb-microbe.cs.uni-saarland.de/busybee。
Nucleic Acids Res. 2017-7-3
Nucleic Acids Res. 2022-7-5
BMC Bioinformatics. 2020-7-28
BMC Bioinformatics. 2017-9-20
BMC Bioinformatics. 2017-12-28
Comput Biol Chem. 2022-10
Algorithms Mol Biol. 2021-5-4
Brief Bioinform. 2024-7-25
Front Microbiol. 2024-5-31
Int J Mol Sci. 2023-6-1
Front Microbiol. 2023-5-18
Methods Mol Biol. 2023
Gigascience. 2022-12-28
Gigascience. 2017-3-1
Nucleic Acids Res. 2017-5-5
Genome Announc. 2016-12-8
Genome Res. 2016-12
Nat Microbiol. 2016-10-10