多组宏基因组样本的联合分析。

Joint analysis of multiple metagenomic samples.

机构信息

School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel.

出版信息

PLoS Comput Biol. 2012;8(2):e1002373. doi: 10.1371/journal.pcbi.1002373. Epub 2012 Feb 16.

DOI:10.1371/journal.pcbi.1002373

PMID:22359490

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3280959/

Abstract

The availability of metagenomic sequencing data, generated by sequencing DNA pooled from multiple microbes living jointly, has increased sharply in the last few years with developments in sequencing technology. Characterizing the contents of metagenomic samples is a challenging task, which has been extensively attempted by both supervised and unsupervised techniques, each with its own limitations. Common to practically all the methods is the processing of single samples only; when multiple samples are sequenced, each is analyzed separately and the results are combined. In this paper we propose to perform a combined analysis of a set of samples in order to obtain a better characterization of each of the samples, and provide two applications of this principle. First, we use an unsupervised probabilistic mixture model to infer hidden components shared across metagenomic samples. We incorporate the model in a novel framework for studying association of microbial sequence elements with phenotypes, analogous to the genome-wide association studies performed on human genomes: We demonstrate that stratification may result in false discoveries of such associations, and that the components inferred by the model can be used to correct for this stratification. Second, we propose a novel read clustering (also termed "binning") algorithm which operates on multiple samples simultaneously, leveraging on the assumption that the different samples contain the same microbial species, possibly in different proportions. We show that integrating information across multiple samples yields more precise binning on each of the samples. Moreover, for both applications we demonstrate that given a fixed depth of coverage, the average per-sample performance generally increases with the number of sequenced samples as long as the per-sample coverage is high enough.

摘要

近年来，随着测序技术的发展，高通量宏基因组测序数据（即对多种微生物混合 DNA 进行测序获得的数据）的可用性急剧增加。宏基因组样本内容的特征描述是一项具有挑战性的任务，已经有监督和无监督技术对其进行了广泛的尝试，但这两种方法都有其自身的局限性。几乎所有方法的共同点是仅处理单个样本；当对多个样本进行测序时，每个样本都是单独分析的，然后将结果进行组合。在本文中，我们提出对一组样本进行联合分析，以便更好地描述每个样本，并提供了该原理的两个应用。首先，我们使用无监督概率混合模型来推断跨宏基因组样本共享的隐藏成分。我们将该模型纳入一个新的框架中，用于研究微生物序列元素与表型之间的关联，类似于在人类基因组上进行的全基因组关联研究：我们证明分层可能导致此类关联的错误发现，并且模型推断出的成分可以用于纠正这种分层。其次，我们提出了一种新颖的读聚类（也称为“分箱”）算法，该算法可以同时对多个样本进行操作，假设不同的样本包含相同的微生物物种，可能比例不同。我们表明，跨多个样本整合信息可以提高每个样本的分箱精度。此外，对于这两个应用程序，我们证明，在给定固定的覆盖深度的情况下，只要每个样本的覆盖足够高，随着测序样本数量的增加，平均每个样本的性能通常会提高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f3df/3280959/9f19a47aed02/pcbi.1002373.g001.jpg

相似文献

Joint analysis of multiple metagenomic samples.多组宏基因组样本的联合分析。

PLoS Comput Biol. 2012;8(2):e1002373. doi: 10.1371/journal.pcbi.1002373. Epub 2012 Feb 16.

CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision.CoMet：一种使用 contig 覆盖度和组成进行宏基因组样本高精度分箱的工作流程。

BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):571. doi: 10.1186/s12859-017-1967-3.

Accurate genome relative abundance estimation based on shotgun metagenomic reads.基于高通量宏基因组测序reads 的精确基因组相对丰度估计

PLoS One. 2011;6(12):e27992. doi: 10.1371/journal.pone.0027992. Epub 2011 Dec 6.

Binning Metagenomic Contigs Using Unsupervised Clustering and Reference Databases.使用无监督聚类和参考数据库对宏基因组重叠群进行分箱

Interdiscip Sci. 2022 Dec;14(4):795-803. doi: 10.1007/s12539-022-00526-y. Epub 2022 May 31.

Exploiting topic modeling to boost metagenomic reads binning.利用主题建模来促进宏基因组读数分箱。

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-16-S5-S2. Epub 2015 Mar 18.

MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures.MetaProb：基于概率序列特征的准确宏基因组 reads 分箱

Bioinformatics. 2016 Sep 1;32(17):i567-i575. doi: 10.1093/bioinformatics/btw466.

MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage.MetaCon：基于概率 k- -mer 统计和覆盖度的无监督宏基因组序列聚类

BMC Bioinformatics. 2019 Nov 22;20(Suppl 9):367. doi: 10.1186/s12859-019-2904-4.

Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets.评估宏基因组工具在真实宏基因组数据集和 CAMI 数据集上的基因组 binning 效果。

BMC Bioinformatics. 2020 Jul 28;21(1):334. doi: 10.1186/s12859-020-03667-3.

A framework for space-efficient read clustering in metagenomic samples.宏基因组样本中空间高效读取聚类的框架。

BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):59. doi: 10.1186/s12859-017-1466-6.

A novel abundance-based algorithm for binning metagenomic sequences using l-tuples.一种基于丰度的新型算法，用于使用l元组对宏基因组序列进行分箱。

J Comput Biol. 2011 Mar;18(3):523-34. doi: 10.1089/cmb.2010.0245.

引用本文的文献

Four functional profiles for fibre and mucin metabolism in the human gut microbiome.人类肠道微生物组中纤维和粘蛋白代谢的四个功能特征。

Microbiome. 2023 Oct 20;11(1):231. doi: 10.1186/s40168-023-01667-y.

Evaluating the number of different genomes in a metagenome by means of the compositional spectra approach.通过组成谱方法评估宏基因组中不同基因组的数量。

PLoS One. 2020 Nov 6;15(11):e0237205. doi: 10.1371/journal.pone.0237205. eCollection 2020.

A framework for space-efficient read clustering in metagenomic samples.宏基因组样本中空间高效读取聚类的框架。

BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):59. doi: 10.1186/s12859-017-1466-6.

Inferring Aggregated Functional Traits from Metagenomic Data Using Constrained Non-negative Matrix Factorization: Application to Fiber Degradation in the Human Gut Microbiota.使用约束非负矩阵分解从宏基因组数据推断聚合功能特征：在人类肠道微生物群纤维降解中的应用

PLoS Comput Biol. 2016 Dec 16;12(12):e1005252. doi: 10.1371/journal.pcbi.1005252. eCollection 2016 Dec.

GUTSS: An Alignment-Free Sequence Comparison Method for Use in Human Intestinal Microbiome and Fecal Microbiota Transplantation Analysis.GUTSS：一种用于人类肠道微生物组和粪便微生物群移植分析的无比对序列比较方法。

PLoS One. 2016 Jul 8;11(7):e0158897. doi: 10.1371/journal.pone.0158897. eCollection 2016.

Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples.偏最小二乘回归有助于检测宏基因组样本集中多个特征的差异丰度。

Front Genet. 2015 Dec 17;6:350. doi: 10.3389/fgene.2015.00350. eCollection 2015.

Accurate, multi-kb reads resolve complex populations and detect rare microorganisms.精确的多千碱基读取可解析复杂菌群并检测罕见微生物。

Genome Res. 2015 Apr;25(4):534-43. doi: 10.1101/gr.183012.114. Epub 2015 Feb 9.

Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods.Taxator-tk：通过快速近似进化邻域对宏基因组进行精确的分类学归属

Bioinformatics. 2015 Mar 15;31(6):817-24. doi: 10.1093/bioinformatics/btu745. Epub 2014 Nov 10.

Exploration and retrieval of whole-metagenome sequencing samples.全基因组测序样本的探索与检索。

Bioinformatics. 2014 Sep 1;30(17):2471-9. doi: 10.1093/bioinformatics/btu340. Epub 2014 May 19.

Reconstructing the genomic content of microbiome taxa through shotgun metagenomic deconvolution.通过高通量宏基因组去卷积重建微生物组分类群的基因组内容。

PLoS Comput Biol. 2013;9(10):e1003292. doi: 10.1371/journal.pcbi.1003292. Epub 2013 Oct 17.

本文引用的文献

A non-negative matrix factorization framework for identifying modular patterns in metagenomic profile data.一种用于识别宏基因组谱数据中模块化模式的非负矩阵分解框架。

J Math Biol. 2012 Mar;64(4):697-711. doi: 10.1007/s00285-011-0428-2. Epub 2011 Jun 1.

Mixture models for analysis of the taxonomic composition of metagenomes.用于宏基因组分类组成分析的混合模型。

Bioinformatics. 2011 Jun 15;27(12):1618-24. doi: 10.1093/bioinformatics/btr266. Epub 2011 May 5.

Enterotypes of the human gut microbiome.人类肠道微生物组的肠型。

Nature. 2011 May 12;473(7346):174-80. doi: 10.1038/nature09944. Epub 2011 Apr 20.

A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio.一种具有任意物种丰度比的宏基因组序列的健壮且准确的分箱算法。

Bioinformatics. 2011 Jun 1;27(11):1489-95. doi: 10.1093/bioinformatics/btr186. Epub 2011 Apr 14.

A novel abundance-based algorithm for binning metagenomic sequences using l-tuples.一种基于丰度的新型算法，用于使用l元组对宏基因组序列进行分箱。

J Comput Biol. 2011 Mar;18(3):523-34. doi: 10.1089/cmb.2010.0245.

Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.基于 l-mers 稳健选择的无监督环境基因组片段分箱。

BMC Bioinformatics. 2010 Apr 16;11 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-11-S2-S5.

A human gut microbial gene catalogue established by metagenomic sequencing.宏基因组测序建立的人类肠道微生物基因目录。

Nature. 2010 Mar 4;464(7285):59-65. doi: 10.1038/nature08821.

The theory of discovering rare variants via DNA sequencing.通过 DNA 测序发现稀有变异的理论。

BMC Genomics. 2009 Oct 20;10:485. doi: 10.1186/1471-2164-10-485.

Unsupervised statistical clustering of environmental shotgun sequences.无监督的环境 shotgun 序列统计聚类。

BMC Bioinformatics. 2009 Oct 2;10:316. doi: 10.1186/1471-2105-10-316.

Estimation of bacterial species phylogeny through oligonucleotide frequency distances.通过寡核苷酸频率距离估计细菌物种系统发育。

Genomics. 2009 Jun;93(6):525-33. doi: 10.1016/j.ygeno.2009.01.009. Epub 2009 Feb 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

多组宏基因组样本的联合分析。

Joint analysis of multiple metagenomic samples.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献