Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway.
Akvaplan-Niva, Tromsø, Norway.
BMC Bioinformatics. 2024 Jul 12;25(1):237. doi: 10.1186/s12859-024-05837-z.
With the emergence of Oxford Nanopore technology, now the on-site sequencing of 16S rRNA from environments is available. Due to the error level and structure, the analysis of such data demands some database of reference sequences. However, many taxa from complex and diverse environments, have poor representation in publicly available databases. In this paper, we propose the METASEED pipeline for the reconstruction of full-length 16S sequences from such environments, in order to improve the reference for the subsequent use of on-site sequencing.
We show that combining high-precision short-read sequencing of both 16S and full metagenome from the same samples allow us to reconstruct high-quality 16S sequences from the more abundant taxa. A significant novelty is the carefully designed collection of metagenome reads that matches the 16S amplicons, based on a combination of uniqueness and abundance. Compared to alternative approaches this produces superior results.
Our pipeline will facilitate numerous studies associated with various unknown microorganisms, thus allowing the comprehension of the diverse environments. The pipeline is a potential tool in generating a full length 16S rRNA gene database for any environment.
随着牛津纳米孔技术的出现,现在可以对环境中的 16S rRNA 进行现场测序。由于错误水平和结构,此类数据的分析需要一些参考序列数据库。然而,许多来自复杂多样的环境的分类单元在公开可用的数据库中代表性较差。在本文中,我们提出了 METASEED 管道,用于从这些环境中重建全长 16S 序列,以改进后续现场测序的参考。
我们表明,结合来自相同样本的高精度短读 16S 和全长宏基因组测序,使我们能够从更丰富的分类单元中重建高质量的 16S 序列。一个显著的新颖之处是,根据独特性和丰度的组合,精心设计了与 16S 扩增子匹配的宏基因组读取集合。与替代方法相比,这产生了更好的结果。
我们的管道将促进与各种未知微生物相关的众多研究,从而有助于理解多样化的环境。该管道是为任何环境生成全长 16S rRNA 基因数据库的潜在工具。