Suppr超能文献

隐喻-一种用于简化宏基因组组装和分类的工作流程。

Metaphor-A workflow for streamlined assembly and binning of metagenomes.

机构信息

Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Parkville, VIC 3052, Victoria, Australia.

Melbourne Data Analytics Platform (MDAP), University of Melbourne, Carlton, VIC 3053, Victoria, Australia.

出版信息

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad055. Epub 2023 Jul 31.

Abstract

Recent advances in bioinformatics and high-throughput sequencing have enabled the large-scale recovery of genomes from metagenomes. This has the potential to bring important insights as researchers can bypass cultivation and analyze genomes sourced directly from environmental samples. There are, however, technical challenges associated with this process, most notably the complexity of computational workflows required to process metagenomic data, which include dozens of bioinformatics software tools, each with their own set of customizable parameters that affect the final output of the workflow. At the core of these workflows are the processes of assembly-combining the short-input reads into longer, contiguous fragments (contigs)-and binning, clustering these contigs into individual genome bins. The limitations of assembly and binning algorithms also pose different challenges depending on the selected strategy to execute them. Both of these processes can be done for each sample separately or by pooling together multiple samples to leverage information from a combination of samples. Here we present Metaphor, a fully automated workflow for genome-resolved metagenomics (GRM). Metaphor differs from existing GRM workflows by offering flexible approaches for the assembly and binning of the input data and by combining multiple binning algorithms with a bin refinement step to achieve high-quality genome bins. Moreover, Metaphor generates reports to evaluate the performance of the workflow. We showcase the functionality of Metaphor on different synthetic datasets and the impact of available assembly and binning strategies on the final results.

摘要

近年来,生物信息学和高通量测序技术的发展使得从宏基因组中大规模恢复基因组成为可能。这有可能带来重要的见解,因为研究人员可以绕过培养过程,直接从环境样本中分析基因组。然而,这个过程存在技术挑战,最显著的是处理宏基因组数据所需的计算工作流程的复杂性,其中包括几十个生物信息学软件工具,每个工具都有自己的一组可定制参数,这些参数会影响工作流程的最终输出。在这些工作流程的核心是组装过程,即将短输入读取组合成长的、连续的片段(contigs),以及将这些 contigs 聚类成单独的基因组 bin。组装和 binning 算法的局限性也根据执行它们的选定策略而带来不同的挑战。这两个过程都可以分别对每个样本进行,也可以将多个样本汇集在一起,以利用来自多个样本的信息。在这里,我们展示了 Metaphor,这是一种用于基因组解析宏基因组学(GRM)的全自动工作流程。Metaphor 与现有的 GRM 工作流程不同,它提供了灵活的输入数据组装和 binning 方法,并结合了多个 binning 算法和一个 bin 细化步骤,以实现高质量的基因组 bin。此外,Metaphor 生成报告来评估工作流程的性能。我们在不同的合成数据集上展示了 Metaphor 的功能,以及可用的组装和 binning 策略对最终结果的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1206/10388702/015d2e14717a/giad055fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验