Department of Medical Microbiology and Immunology, Genome Center, University of California Davis, One Shields Avenue, Davis, CA 95616, USA.
BMC Bioinformatics. 2010 Jun 12;11:317. doi: 10.1186/1471-2105-11-317.
For more than two decades microbiologists have used a highly conserved microbial gene as a phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA sequence analysis has increased correspondingly.
We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16 S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the open-source Kepler system as a platform.
By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the workflow as desired and we see WATERS as an initial seed for a sizeable and growing repository of interoperable, easy-to-combine tools for asking increasingly complex microbial ecology questions.
二十多年来,微生物学家一直将一种高度保守的微生物基因用作细菌和古菌的系统发育标记。小亚基核糖体 RNA 基因,也称为 16S rRNA,由核糖体 DNA 编码,16S rDNA,并为微生物生态学家提供了一个强大的比较工具。随着时间的推移,微生物生态学领域从少数环境的小规模研究发展到大规模的序列数据收集,这些数据与数十个相应的收集变量配对。随着数据和工具集的复杂性不断增加,对 16S rDNA 序列分析的核心过程的灵活自动化和维护的需求也相应增加。
我们提出了 WATERS,这是一种用于 16S rDNA 分析的综合方法,它将一套公开可用的 16S rDNA 分析软件工具捆绑到一个单一的软件包中。该“工具包”包括序列比对、嵌合体去除、OTU 确定、分类群分配、系统发育树构建以及一系列生态分析和可视化工具。WATERS 采用灵活的、面向集合的“工作流”方法,使用开源 Kepler 系统作为平台。
通过将可用的软件工具打包到单个自动化工作流程中,WATERS 简化了 16S rDNA 分析,特别是对于那些没有专门的生物信息学、编程专业知识的人。此外,WATERS 与一些新的综合 rRNA 分析工具一样,可以让研究人员将更多的时间用于繁琐的信息学步骤,并将注意力集中在对结果的生物学解释上。WATERS 相对于其他综合工具的一个优势是,使用 Kepler 工作流系统通过数据来源子系统促进结果解释和可重复性。此外,可以根据需要向工作流中添加新的“参与者”,我们将 WATERS 视为一个初始的种子,用于构建一个规模可观且不断增长的、可互操作的、易于组合的工具库,以提出越来越复杂的微生物生态学问题。