Center for Microbial Ecology, Michigan State University, East Lansing, Michigan, USA.
mBio. 2013 Sep 17;4(5):e00592-13. doi: 10.1128/mBio.00592-13.
Biological nitrogen fixation is an important component of sustainable soil fertility and a key component of the nitrogen cycle. We used targeted metagenomics to study the nitrogen fixation-capable terrestrial bacterial community by targeting the gene for nitrogenase reductase (nifH). We obtained 1.1 million nifH 454 amplicon sequences from 222 soil samples collected from 4 National Ecological Observatory Network (NEON) sites in Alaska, Hawaii, Utah, and Florida. To accurately detect and correct frameshifts caused by indel sequencing errors, we developed FrameBot, a tool for frameshift correction and nearest-neighbor classification, and compared its accuracy to that of two other rapid frameshift correction tools. We found FrameBot was, in general, more accurate as long as a reference protein sequence with 80% or greater identity to a query was available, as was the case for virtually all nifH reads for the 4 NEON sites. Frameshifts were present in 12.7% of the reads. Those nifH sequences related to the Proteobacteria phylum were most abundant, followed by those for Cyanobacteria in the Alaska and Utah sites. Predominant genera with nifH sequences similar to reads included Azospirillum, Bradyrhizobium, and Rhizobium, the latter two without obvious plant hosts at the sites. Surprisingly, 80% of the sequences had greater than 95% amino acid identity to known nifH gene sequences. These samples were grouped by site and correlated with soil environmental factors, especially drainage, light intensity, mean annual temperature, and mean annual precipitation. FrameBot was tested successfully on three ecofunctional genes but should be applicable to any.
High-throughput phylogenetic analysis of microbial communities using rRNA-targeted sequencing is now commonplace; however, such data often allow little inference with respect to either the presence or the diversity of genes involved in most important ecological processes. To study the gene pool for these processes, it is more straightforward to assess the genes directly responsible for the ecological function (ecofunctional genes). However, analyzing these genes involves technical challenges beyond those seen for rRNA. In particular, frameshift errors cause garbled downstream protein translations. Our FrameBot tool described here both corrects frameshift errors in query reads and determines their closest matching protein sequences in a set of reference sequences. We validated this new tool with sequences from defined communities and demonstrated the tool's utility on nifH gene fragments sequenced from soils in well-characterized and major terrestrial ecosystem types.
生物固氮是可持续土壤肥力的重要组成部分,也是氮循环的关键组成部分。我们使用靶向宏基因组学通过靶向氮酶还原酶(nifH)基因来研究具有固氮能力的陆地细菌群落。我们从阿拉斯加、夏威夷、犹他州和佛罗里达州的 4 个国家生态观测网(NEON)站点采集的 222 个土壤样本中获得了 110 万个 nifH 454 扩增子序列。为了准确检测和纠正由插入缺失测序错误引起的移码,我们开发了 FrameBot,这是一种用于移码校正和最近邻分类的工具,并将其准确性与另外两种快速移码校正工具进行了比较。我们发现,只要有一个与查询具有 80%或更高同一性的参考蛋白序列,FrameBot 通常更准确,对于 4 个 NEON 站点的几乎所有 nifH 读取都是如此。移码存在于 12.7%的读取中。那些与变形菌门相关的 nifH 序列最为丰富,其次是阿拉斯加和犹他州的蓝细菌。与 nifH 序列相似的主要属包括固氮螺菌、慢生根瘤菌和根瘤菌,后两者在这些地点没有明显的植物宿主。令人惊讶的是,80%的序列与已知的 nifH 基因序列具有大于 95%的氨基酸同一性。这些样本按地点分组,并与土壤环境因素相关,特别是排水、光照强度、年平均温度和年平均降水量。FrameBot 已成功测试了三个生态功能基因,但应该适用于任何基因。
使用 rRNA 靶向测序对微生物群落进行高通量系统发育分析现在已很常见;然而,这种数据通常几乎无法推断出参与大多数重要生态过程的基因的存在或多样性。为了研究这些过程的基因库,直接评估负责生态功能的基因(生态功能基因)更为简单。然而,分析这些基因涉及到比 rRNA 看到的更具挑战性的技术挑战。特别是,移码错误会导致下游蛋白质翻译混乱。我们在这里描述的 FrameBot 工具不仅可以校正查询读取中的移码错误,还可以确定它们在一组参考序列中最匹配的蛋白质序列。我们使用来自定义群落的序列验证了这个新工具,并在从特征明确和主要陆地生态系统类型的土壤中测序的 nifH 基因片段上展示了该工具的实用性。