Lynch Michael D J, Neufeld Josh D
Department of Biology, University of Waterloo, Waterloo, Ontario, Canada.
mSystems. 2016 Dec 20;1(6). doi: 10.1128/mSystems.00133-16. eCollection 2016 Nov-Dec.
High-throughput sequencing of small-subunit (SSU) rRNA genes has revolutionized understanding of microbial communities and facilitated investigations into ecological dynamics at unprecedented scales. Such extensive SSU rRNA gene sequence libraries, constructed from DNA extracts of environmental or host-associated samples, often contain a substantial proportion of unclassified sequences, many representing organisms with novel taxonomy (taxonomic "blind spots") and potentially unique ecology. Indeed, these novel taxonomic lineages are associated with so-called microbial "dark matter," which is the genomic potential of these lineages. Unfortunately, characterization beyond "unclassified" is challenging due to relatively short read lengths and large data set sizes. Here we demonstrate how mining of phylogenetically novel sequences from microbial ecosystems can be automated using SSUnique, a software pipeline that filters unclassified and/or rare operational taxonomic units (OTUs) from 16S rRNA gene sequence libraries by screening against consensus structural models for SSU rRNA. Phylogenetic position is inferred against a reference data set, and additional characterization of novel clades is also included, such as targeted probe/primer design and mining of assembled metagenomes for genomic context. We show how SSUnique reproduced a previous analysis of phylogenetic novelty from an Arctic tundra soil and demonstrate the recovery of highly novel clades from data sets associated with both the Earth Microbiome Project (EMP) and Human Microbiome Project (HMP). We anticipate that SSUnique will add to the expanding computational toolbox supporting high-throughput sequencing approaches for the study of microbial ecology and phylogeny. Extensive SSU rRNA gene sequence libraries, constructed from DNA extracts of environmental or host-associated samples, often contain many unclassified sequences, many representing organisms with novel taxonomy (taxonomic "blind spots") and potentially unique ecology. This novelty is poorly explored in standard workflows, which narrows the breadth and discovery potential of such studies. Here we present the SSUnique analysis pipeline, which will promote the exploration of unclassified diversity in microbiome research and, importantly, enable the discovery of substantial novel taxonomic lineages through the analysis of a large variety of existing data sets.
小亚基(SSU)核糖体RNA基因的高通量测序彻底改变了人们对微生物群落的理解,并以前所未有的规模推动了对生态动力学的研究。从环境样本或宿主相关样本的DNA提取物构建的此类广泛的SSU rRNA基因序列文库,通常包含很大比例的未分类序列,其中许多代表具有新分类学的生物(分类学“盲点”)以及可能独特的生态学特征。实际上,这些新的分类谱系与所谓的微生物“暗物质”相关,即这些谱系的基因组潜力。不幸的是,由于读长相对较短和数据集规模较大,除了“未分类”之外的特征描述具有挑战性。在这里,我们展示了如何使用SSUnique自动挖掘微生物生态系统中系统发育新序列,SSUnique是一种软件流程,通过针对SSU rRNA的共有结构模型进行筛选,从16S rRNA基因序列文库中过滤未分类和/或罕见的操作分类单元(OTU)。根据参考数据集推断系统发育位置,还包括对新分支的额外特征描述,例如靶向探针/引物设计以及挖掘组装的宏基因组以获取基因组背景信息。我们展示了SSUnique如何重现先前对北极苔原土壤系统发育新颖性的分析,并证明从与地球微生物组计划(EMP)和人类微生物组计划(HMP)相关的数据集中恢复了高度新颖的分支。我们预计SSUnique将为支持微生物生态学和系统发育研究的高通量测序方法的不断扩展的计算工具库增添内容。从环境样本或宿主相关样本的DNA提取物构建的广泛的SSU rRNA基因序列文库,通常包含许多未分类序列,其中许多代表具有新分类学的生物(分类学“盲点”)以及可能独特的生态学特征。这种新颖性在标准工作流程中探索不足,这限制了此类研究的广度和发现潜力。在这里,我们展示了SSUnique分析流程,它将促进微生物组研究中未分类多样性的探索,重要的是,通过分析大量现有数据集能够发现大量新的分类谱系。