Foster Zachary S L, Chamberlain Scott, Grünwald Niklaus J
Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, 97331, USA.
rOpenSci, University of California, Berkeley, CA, 94720, USA.
F1000Res. 2018 Mar 5;7:272. doi: 10.12688/f1000research.14013.2. eCollection 2018.
The taxa R package provides a set of tools for defining and manipulating taxonomic data. The recent and widespread application of DNA sequencing to community composition studies is making large data sets with taxonomic information commonplace. However, compared to typical tabular data, this information is encoded in many different ways and the hierarchical nature of taxonomic classifications makes it difficult to work with. There are many R packages that use taxonomic data to varying degrees but there is currently no cross-package standard for how this information is encoded and manipulated. We developed the R package taxa to provide a robust and flexible solution to storing and manipulating taxonomic data in R and any application-specific information associated with it. Taxa provides parsers that can read common sources of taxonomic information (taxon IDs, sequence IDs, taxon names, and classifications) from nearly any format while preserving associated data. Once parsed, the taxonomic data and any associated data can be manipulated using a cohesive set of functions modeled after the popular R package dplyr. These functions take into account the hierarchical nature of taxa and can modify the taxonomy or associated data in such a way that both are kept in sync. Taxa is currently being used by the metacoder and taxize packages, which provide broadly useful functionality that we hope will speed adoption by users and developers.
“taxa”R包提供了一组用于定义和处理分类数据的工具。DNA测序在群落组成研究中的广泛应用使得包含分类信息的大数据集变得很常见。然而,与典型的表格数据相比,这些信息以多种不同方式编码,并且分类学分类的层次性质使得处理起来很困难。有许多R包不同程度地使用分类数据,但目前对于如何编码和处理这些信息还没有跨包标准。我们开发了R包“taxa”,以便为在R中存储和处理分类数据以及与之相关的任何特定应用信息提供一个强大而灵活的解决方案。“taxa”提供了解析器,它可以从几乎任何格式读取分类信息的常见来源(分类单元ID、序列ID、分类单元名称和分类),同时保留相关数据。一旦解析完成,分类数据和任何相关数据可以使用一组以流行的R包“dplyr”为模型的内聚函数进行处理。这些函数考虑到分类单元的层次性质,并且可以以一种使两者保持同步的方式修改分类法或相关数据。“taxa”目前正被“metacoder”和“taxize”包使用,它们提供了广泛有用的功能,我们希望这将加快用户和开发者的采用。