IBM Research, The Hartree Centre, Warrington, WA4 4AD, UK.
IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA.
Microbiome. 2021 Jan 9;9(1):4. doi: 10.1186/s40168-020-00971-1.
Widespread bioinformatic resource development generates a constantly evolving and abundant landscape of workflows and software. For analysis of the microbiome, workflows typically begin with taxonomic classification of the microorganisms that are present in a given environment. Additional investigation is then required to uncover the functionality of the microbial community, in order to characterize its currently or potentially active biological processes. Such functional analysis of metagenomic data can be computationally demanding for high-throughput sequencing experiments. Instead, we can directly compare sequencing reads to a functionally annotated database. However, since reads frequently match multiple sequences equally well, analyses benefit from a hierarchical annotation tree, e.g. for taxonomic classification where reads are assigned to the lowest taxonomic unit.
To facilitate functional microbiome analysis, we re-purpose well-known taxonomic classification tools to allow us to perform direct functional sequencing read classification with the added benefit of a functional hierarchy. To enable this, we develop and present a tree-shaped functional hierarchy representing the molecular function subset of the Gene Ontology annotation structure. We use this functional hierarchy to replace the standard phylogenetic taxonomy used by the classification tools and assign query sequences accurately to the lowest possible molecular function in the tree. We demonstrate this with simulated and experimental datasets, where we reveal new biological insights.
We demonstrate that improved functional classification of metagenomic sequencing reads is possible by re-purposing a range of taxonomic classification tools that are already well-established, in conjunction with either protein or nucleotide reference databases. We leverage the advances in speed, accuracy and efficiency that have been made for taxonomic classification and translate these benefits for the rapid functional classification of microbiomes. While we focus on a specific set of commonly used methods, the functional annotation approach has broad applicability across other sequence classification tools. We hope that re-purposing becomes a routine consideration during bioinformatic resource development. Video abstract.
广泛的生物信息资源开发产生了不断发展和丰富的工作流程和软件景观。对于微生物组的分析,工作流程通常从对特定环境中存在的微生物进行分类学分类开始。然后需要进行进一步的调查,以揭示微生物群落的功能,从而描述其当前或潜在的活跃生物过程。高通量测序实验中,这种元基因组数据的功能分析可能需要大量的计算资源。相反,我们可以直接将测序reads 与功能注释数据库进行比较。然而,由于reads 经常与多个序列同样匹配良好,因此分析受益于分层注释树,例如在分类学分类中,将 reads 分配给最低的分类单元。
为了方便功能微生物组分析,我们重新利用了著名的分类学分类工具,使我们能够直接进行功能测序 read 分类,并具有功能层次结构的额外好处。为了实现这一点,我们开发并提出了一个树状功能层次结构,代表基因本体论注释结构的分子功能子集。我们使用这个功能层次结构来替换分类工具使用的标准系统发育分类,并将查询序列准确地分配到树中最低可能的分子功能。我们使用模拟和实验数据集证明了这一点,从中我们揭示了新的生物学见解。
我们通过重新利用一系列已经成熟的分类学分类工具,并结合蛋白质或核苷酸参考数据库,证明了对宏基因组测序 reads 进行改进的功能分类是可能的。我们利用在分类学分类方面取得的速度、准确性和效率方面的进步,并将这些优势转化为微生物组快速功能分类。虽然我们专注于一组特定的常用方法,但功能注释方法在其他序列分类工具中具有广泛的适用性。我们希望重新利用在生物信息资源开发过程中成为常规考虑因素。视频摘要。