Suppr超能文献

BASALT对宏基因组数据进行分箱优化,并提高基因组解析宏基因组分析的分辨率。

BASALT refines binning from metagenomic data and increases resolution of genome-resolved metagenomic analysis.

作者信息

Qiu Zhiguang, Yuan Li, Lian Chun-Ang, Lin Bin, Chen Jie, Mu Rong, Qiao Xuejiao, Zhang Liyu, Xu Zheng, Fan Lu, Zhang Yunzeng, Wang Shanquan, Li Junyi, Cao Huiluo, Li Bing, Chen Baowei, Song Chi, Liu Yongxin, Shi Lili, Tian Yonghong, Ni Jinren, Zhang Tong, Zhou Jizhong, Zhuang Wei-Qin, Yu Ke

机构信息

Eco-environment and Resource Efficiency Research Laboratory, School of Environment and Energy, Shenzhen Graduate School, Peking University, Shenzhen, China.

AI for Science (AI4S)-Preferred Program, Peking University, Shenzhen, China.

出版信息

Nat Commun. 2024 Mar 11;15(1):2179. doi: 10.1038/s41467-024-46539-7.

Abstract

Metagenomic binning is an essential technique for genome-resolved characterization of uncultured microorganisms in various ecosystems but hampered by the low efficiency of binning tools in adequately recovering metagenome-assembled genomes (MAGs). Here, we introduce BASALT (Binning Across a Series of Assemblies Toolkit) for binning and refinement of short- and long-read sequencing data. BASALT employs multiple binners with multiple thresholds to produce initial bins, then utilizes neural networks to identify core sequences to remove redundant bins and refine non-redundant bins. Using the same assemblies generated from Critical Assessment of Metagenome Interpretation (CAMI) datasets, BASALT produces up to twice as many MAGs as VAMB, DASTool, or metaWRAP. Processing assemblies from a lake sediment dataset, BASALT produces ~30% more MAGs than metaWRAP, including 21 unique class-level prokaryotic lineages. Functional annotations reveal that BASALT can retrieve 47.6% more non-redundant opening-reading frames than metaWRAP. These results highlight the robust handling of metagenomic sequencing data of BASALT.

摘要

宏基因组分箱是对各种生态系统中未培养微生物进行基因组解析表征的一项重要技术,但受到分箱工具在充分回收宏基因组组装基因组(MAG)方面效率低下的阻碍。在此,我们介绍了用于对短读长和长读长测序数据进行分箱和优化的BASALT(跨系列组装分箱工具包)。BASALT采用多个具有多个阈值的分箱器来生成初始分箱,然后利用神经网络识别核心序列以去除冗余分箱并优化非冗余分箱。使用从宏基因组解释关键评估(CAMI)数据集生成的相同组装数据,BASALT产生的MAG数量是VAMB、DASTool或metaWRAP的两倍。处理来自湖泊沉积物数据集的组装数据时,BASALT产生的MAG比metaWRAP多约30%,包括21个独特的类级原核生物谱系。功能注释显示,BASALT比metaWRAP能多检索47.6%的非冗余开放阅读框。这些结果突出了BASALT对宏基因组测序数据的强大处理能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dab0/10928208/5d388e130bd6/41467_2024_46539_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验