Utro Filippo, Haiminen Niina, Siragusa Enrico, Gardiner Laura-Jayne, Seabolt Ed, Krishna Ritesh, Kaufman James H, Parida Laxmi
IBM Research, T.J. Watson Research Center, Yorktown Heights, NY 10598, USA.
IBM Research, The Hartree Centre, Warrington, WA4 4AD, UK.
iScience. 2020 Apr 24;23(4):100988. doi: 10.1016/j.isci.2020.100988. Epub 2020 Mar 17.
Increasingly available microbial reference data allow interpreting the composition and function of previously uncharacterized microbial communities in detail, via high-throughput sequencing analysis. However, efficient methods for read classification are required when the best database matches for short sequence reads are often shared among multiple reference sequences. Here, we take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier, PRROMenade, by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree. PRROMenade solves the multi-matching problem while allowing fast variable-size sequence classification for phylogenetic or functional annotation. Our simulations with 5% added differences from reference indicated only 1.5% error rate for PRROMenade functional classification. On metatranscriptomic data PRROMenade highlighted biologically relevant functional pathways related to diet-induced changes in the human gut microbiome.
越来越多的微生物参考数据使得通过高通量测序分析能够详细解读以前未表征的微生物群落的组成和功能。然而,当短序列 reads 的最佳数据库匹配项通常在多个参考序列之间共享时,就需要有效的 reads 分类方法。在这里,我们利用微生物序列可以相对于既定的树结构进行注释这一事实,并通过在广义布隆斯-惠勒变换中增加一个标记步骤来直接将 reads 分配到注释树中相应的最低分类单元,从而开发出一种高度可扩展的 reads 分类器 PRROMenade。PRROMenade 解决了多匹配问题,同时允许对系统发育或功能注释进行快速可变大小的序列分类。我们添加了与参考序列有 5%差异的模拟显示,PRROMenade 功能分类的错误率仅为 1.5%。在宏转录组数据上,PRROMenade 突出显示了与饮食引起的人类肠道微生物群变化相关的生物学相关功能途径。