Suppr超能文献

多组宏基因组样本的联合分析。

Joint analysis of multiple metagenomic samples.

机构信息

School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel.

出版信息

PLoS Comput Biol. 2012;8(2):e1002373. doi: 10.1371/journal.pcbi.1002373. Epub 2012 Feb 16.

Abstract

The availability of metagenomic sequencing data, generated by sequencing DNA pooled from multiple microbes living jointly, has increased sharply in the last few years with developments in sequencing technology. Characterizing the contents of metagenomic samples is a challenging task, which has been extensively attempted by both supervised and unsupervised techniques, each with its own limitations. Common to practically all the methods is the processing of single samples only; when multiple samples are sequenced, each is analyzed separately and the results are combined. In this paper we propose to perform a combined analysis of a set of samples in order to obtain a better characterization of each of the samples, and provide two applications of this principle. First, we use an unsupervised probabilistic mixture model to infer hidden components shared across metagenomic samples. We incorporate the model in a novel framework for studying association of microbial sequence elements with phenotypes, analogous to the genome-wide association studies performed on human genomes: We demonstrate that stratification may result in false discoveries of such associations, and that the components inferred by the model can be used to correct for this stratification. Second, we propose a novel read clustering (also termed "binning") algorithm which operates on multiple samples simultaneously, leveraging on the assumption that the different samples contain the same microbial species, possibly in different proportions. We show that integrating information across multiple samples yields more precise binning on each of the samples. Moreover, for both applications we demonstrate that given a fixed depth of coverage, the average per-sample performance generally increases with the number of sequenced samples as long as the per-sample coverage is high enough.

摘要

近年来,随着测序技术的发展,高通量宏基因组测序数据(即对多种微生物混合 DNA 进行测序获得的数据)的可用性急剧增加。宏基因组样本内容的特征描述是一项具有挑战性的任务,已经有监督和无监督技术对其进行了广泛的尝试,但这两种方法都有其自身的局限性。几乎所有方法的共同点是仅处理单个样本;当对多个样本进行测序时,每个样本都是单独分析的,然后将结果进行组合。在本文中,我们提出对一组样本进行联合分析,以便更好地描述每个样本,并提供了该原理的两个应用。首先,我们使用无监督概率混合模型来推断跨宏基因组样本共享的隐藏成分。我们将该模型纳入一个新的框架中,用于研究微生物序列元素与表型之间的关联,类似于在人类基因组上进行的全基因组关联研究:我们证明分层可能导致此类关联的错误发现,并且模型推断出的成分可以用于纠正这种分层。其次,我们提出了一种新颖的读聚类(也称为“分箱”)算法,该算法可以同时对多个样本进行操作,假设不同的样本包含相同的微生物物种,可能比例不同。我们表明,跨多个样本整合信息可以提高每个样本的分箱精度。此外,对于这两个应用程序,我们证明,在给定固定的覆盖深度的情况下,只要每个样本的覆盖足够高,随着测序样本数量的增加,平均每个样本的性能通常会提高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f3df/3280959/9f19a47aed02/pcbi.1002373.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验