Department of Statistics, Stanford University, Stanford, California, United States of America.
PLoS One. 2013 Apr 22;8(4):e61217. doi: 10.1371/journal.pone.0061217. Print 2013.
BACKGROUND: the analysis of microbial communities through dna sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data. RESULTS: Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research. CONCLUSIONS: The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.
背景:通过 DNA 测序分析微生物群落带来了许多挑战:将来自生态学、遗传学、系统发育学、多变量统计学、可视化和检验的不同类型的数据与方法进行整合。随着现在正在进行的实验设计广度的增加,通常需要特定于项目的统计分析,而这些分析对于同行研究人员来说通常难以(或不可能)独立重现。执行这些可重现分析所需的绝大多数必要工具已经在 R 及其扩展(包)中实现,但对高通量微生物组普查数据的支持有限。
结果:在这里,我们描述了一个软件项目 phyloseq,专门用于在 R 中对微生物组普查数据进行面向对象的表示和分析。它支持从各种常见格式导入数据,以及许多分析技术。这些技术包括校准、过滤、子集化、凝聚、多表比较、多样性分析、并行 Fast UniFrac、排序方法以及出版质量图形的生成;所有这些都易于记录、共享和修改。我们展示了如何将其他 R 包中的函数应用于 phyloseq 表示的数据,说明了大量开源分析技术的可用性。我们讨论了使用 phyloseq 与可重复研究的工具,这在其他领域是常见的做法,但在高度平行的微生物组普查数据的分析中仍然很少见。我们提供了完全重现本文中包含的分析和图形所需的所有材料,这是可重复研究的最佳实践的一个示例。
结论:phyloseq 是一个新的开源 R 软件包,可从 GitHub 和 Bioconductor 上免费获得。
Bioinformatics. 2015-9-1
Bioinformatics. 2015-1-15
Protein Cell. 2023-10-25
Pac Symp Biocomput. 2016
Microbiome. 2021-3-28
BMC Bioinformatics. 2020-12-9
BMC Bioinformatics. 2019-6-13
Bioinformatics. 2012-6-12
NPJ Biofilms Microbiomes. 2025-9-9
NPJ Biofilms Microbiomes. 2025-9-9
Signal Transduct Target Ther. 2025-9-10
Front Microbiol. 2025-8-20
Front Cell Infect Microbiol. 2025-8-20
BMC Bioinformatics. 2012-8-24
Nature. 2012-6-13
Bioinformatics. 2012-6-12
Nat Biotechnol. 2012-6-7
Nature. 2012-2-22
Science. 2011-12-2
BMC Bioinformatics. 2011-8-30