Suppr超能文献

宏基因组比较 (MC):一种从全基因组测序读段中检测独特/富集 OMUs(操作宏基因组单位)的新框架。

Metagenome comparison (MC): A new framework for detecting unique/enriched OMUs (operational metagenomic units) derived from whole-genome sequencing reads.

机构信息

Faculty of Arts and Sciences Harvard University Cambridge, MA, 02138, USA; Microbiome Medicine and Advanced AI Lab, Cambridge, MA, 02138, USA; Computational Biology and Medical Ecology Lab Kunming Institute of Zoology Chinese Academy of Sciences, China.

出版信息

Comput Biol Med. 2024 Sep;180:108852. doi: 10.1016/j.compbiomed.2024.108852. Epub 2024 Aug 12.

Abstract

BACKGROUND

Current methods for comparing metagenomes, derived from whole-genome sequencing reads, include top-down metrics or parametric models such as metagenome-diversity, and bottom-up, non-parametric, model-free machine learning approaches like Naïve Bayes for k-mer-profiling. However, both types are limited in their ability to effectively and comprehensively identify and catalogue unique or enriched metagenomic genes, a critical task in comparative metagenomics. This challenge is significant and complex due to its NP-hard nature, which means computational time grows exponentially, or even faster, with the problem size, rendering it impractical for even the fastest supercomputers without heuristic approximation algorithms.

METHOD

In this study, we introduce a new framework, MC (Metagenome-Comparison), designed to exhaustively detect and catalogue unique or enriched metagenomic genes (MGs) and their derivatives, including metagenome functional gene clusters (MFGC), or more generally, the operational metagenomic unit (OMU) that can be considered the counterpart of the OTU (operational taxonomic unit) from amplicon sequencing reads. The MC is essentially a heuristic search algorithm guided by pairs of new metrics (termed MG-specificity or OMU-specificity, MG-specificity diversity or OMU-specificity diversity). It is further constrained by statistical significance (P-value) implemented as a pair of statistical tests.

RESULTS

We evaluated the MC using large metagenomic datasets related to obesity, diabetes, and IBD, and found that the proportions of unique and enriched metagenomic genes ranged from 0.001% to 0.08 % and 0.08%-0.82 % respectively, and less than 10 % for the MFGC.

CONCLUSION

The MC provides a robust method for comparing metagenomes at various scales, from baseline MGs to various function/pathway clusters of metagenomes, collectively termed OMUs.

摘要

背景

目前,用于比较宏基因组的方法包括自上而下的指标或参数模型,如宏基因组多样性,以及自下而上的、非参数的、无模型的机器学习方法,如用于 k-mer 分析的朴素贝叶斯。然而,这两种类型都在有效和全面地识别和编目独特或丰富的宏基因组基因方面存在局限性,这是比较宏基因组学中的一项关键任务。由于其 NP 难性质,这一挑战非常复杂,这意味着计算时间随着问题规模呈指数级增长,甚至更快,即使是最快的超级计算机也无法在没有启发式近似算法的情况下实现。

方法

在本研究中,我们引入了一个新的框架 MC(宏基因组比较),旨在彻底检测和编目独特或丰富的宏基因组基因(MGs)及其衍生物,包括宏基因组功能基因簇(MFGC),或者更一般地说,操作宏基因组单元(OMU),可以被认为是扩增子测序reads 中 OTU(操作分类单元)的对应物。MC 本质上是一种启发式搜索算法,由一对新指标(称为 MG 特异性或 OMU 特异性、MG 特异性多样性或 OMU 特异性多样性)指导。它进一步受到统计显著性(P 值)的约束,表现为一对统计检验。

结果

我们使用与肥胖、糖尿病和 IBD 相关的大型宏基因组数据集来评估 MC,发现独特和丰富的宏基因组基因的比例分别为 0.001%至 0.08%和 0.08%至 0.82%,MFGC 的比例小于 10%。

结论

MC 提供了一种强大的方法来比较各种规模的宏基因组,从基线 MG 到宏基因组的各种功能/途径聚类,统称为 OMU。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验