Department of Computer Science, Rice University, Houston, TX, USA.
Department of Systems, Synthetic and Physical Biology Science, Rice University, Houston, TX, USA.
Nat Methods. 2022 Jul;19(7):845-853. doi: 10.1038/s41592-022-01520-4. Epub 2022 Jun 30.
16S ribosomal RNA-based analysis is the established standard for elucidating the composition of microbial communities. While short-read 16S rRNA analyses are largely confined to genus-level resolution at best, given that only a portion of the gene is sequenced, full-length 16S rRNA gene amplicon sequences have the potential to provide species-level accuracy. However, existing taxonomic identification algorithms are not optimized for the increased read length and error rate often observed in long-read data. Here we present Emu, an approach that uses an expectation-maximization algorithm to generate taxonomic abundance profiles from full-length 16S rRNA reads. Results produced from simulated datasets and mock communities show that Emu is capable of accurate microbial community profiling while obtaining fewer false positives and false negatives than alternative methods. Additionally, we illustrate a real-world application of Emu by comparing clinical sample composition estimates generated by an established whole-genome shotgun sequencing workflow with those returned by full-length 16S rRNA gene sequences processed with Emu.
基于 16S 核糖体 RNA 的分析是阐明微生物群落组成的既定标准。虽然短读长 16S rRNA 分析最多只能达到属级分辨率,因为只有部分基因被测序,但全长 16S rRNA 基因扩增子序列有可能提供种级别的准确性。然而,现有的分类鉴定算法并没有针对长读长数据中经常观察到的增加的读长和错误率进行优化。在这里,我们提出了 Emu,这是一种使用期望最大化算法从全长 16S rRNA 读长生成分类丰度图谱的方法。从模拟数据集和模拟群落中得到的结果表明,Emu 能够准确地进行微生物群落分析,同时比其他方法获得更少的假阳性和假阴性。此外,我们通过比较由建立的全基因组鸟枪法测序工作流程生成的临床样本组成估计值与使用 Emu 处理的全长 16S rRNA 基因序列返回的值,说明了 Emu 的一个实际应用。