Philip Melcy, Nilsen Tonje, Majaneva Sanna, Pettersen Ragnhild, Stokkan Morten, Ray Jessica Louise, Keeley Nigel, Rudi Knut, Snipen Lars-Gustav
Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway.
Akvaplan-niva, Framsenteret, Tromsø, Norway.
Mol Ecol Resour. 2025 Aug 25:e70036. doi: 10.1111/1755-0998.70036.
The Oxford Nanopore Technologies (ONT) sequencing platform is compact and efficient, making it suitable for rapid biodiversity assessments in remote areas. Despite its long reads, ONT has a higher error rate compared to other platforms; necessitating high-quality reference databases for accurate taxonomic assignments. However, the absence of targeted databases for underexplored habitats, such as the seafloor, limits ONT's broader applicability for exploratory analysis. To address this, we propose an approach for building environmentally targeted databases to improve 16S rRNA gene (16S) analysis using Oxford Nanopore Technologies (ONT), using seafloor sediment samples from the Norwegian coast as an example. We started by using Illumina short-read data to create a database of full-length or near full-length 16S sequences from seafloor samples. Initially, amplicons are mapped to the SILVA database, with matches added to our database. Unmatched amplicons are reconstructed using METASEED and Barrnap methodologies with amplicon and metagenome data. Finally, if the previous strategies did not succeed, we included the short-read sequences in the database. This resulted in AQUAeD-DB, which contains 14,545 16S sequences clustered at 95% identity. Comparative database analysis reveals that AQUAeD-DB provides consistent results for both Illumina and Nanopore read assignments (median correlation coefficient: 0.50), whereas a standard database showed a substantially weaker correlation. These findings also emphasise its potential to recognise both high and low abundance taxa, which could be key indicators in environmental studies. This work highlights the necessity of targeted databases for environmental analysis, especially for ONT-based studies, and lays the foundations for future extension of the database.
牛津纳米孔技术公司(ONT)的测序平台紧凑高效,适用于偏远地区的快速生物多样性评估。尽管ONT测序读长较长,但与其他平台相比,其错误率更高;因此需要高质量的参考数据库以进行准确的分类鉴定。然而,对于海底等未充分探索的栖息地,缺乏针对性的数据库限制了ONT在探索性分析中的更广泛应用。为解决这一问题,我们提出了一种构建环境靶向数据库的方法,以改进使用牛津纳米孔技术(ONT)进行的16S核糖体RNA基因(16S)分析,以挪威海岸的海底沉积物样本为例。我们首先使用Illumina短读长数据创建了一个来自海底样本的全长或接近全长16S序列数据库。最初,将扩增子映射到SILVA数据库,匹配的序列添加到我们的数据库中。使用METASEED和Barrnap方法,结合扩增子和宏基因组数据,对不匹配的扩增子进行重建。最后,如果之前的策略均未成功,我们将短读长序列纳入数据库。这产生了AQUAeD-DB,其中包含14,545条16S序列,聚类相似度为95%。对比数据库分析表明,AQUAeD-DB在Illumina和纳米孔读长分配方面均提供了一致的结果(中位数相关系数:0.50),而标准数据库的相关性则显著较弱。这些发现还强调了其识别高丰度和低丰度分类群的潜力,这可能是环境研究中的关键指标。这项工作突出了环境分析中靶向数据库的必要性,特别是对于基于ONT的研究,并为数据库的未来扩展奠定了基础。