Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark.
Department of Organismal Biology (Systematic Biology), Uppsala University, Uppsala, Sweden.
Mol Ecol Resour. 2024 Oct;24(7):e13991. doi: 10.1111/1755-0998.13991. Epub 2024 Jul 9.
The use of short-read metabarcoding for classifying microeukaryotes is challenged by the lack of comprehensive 18S rRNA reference databases. While recent advances in high-throughput long-read sequencing provide the potential to greatly increase the phylogenetic coverage of these databases, the performance of different sequencing technologies and subsequent bioinformatics processing remain to be evaluated, primarily because of the absence of well-defined eukaryotic mock communities. To address this challenge, we created a eukaryotic rRNA operon clone-library and turned it into a precisely defined synthetic eukaryotic mock community. This mock community was then used to evaluate the performance of three long-read sequencing strategies (PacBio circular consensus sequencing and two Nanopore approaches using unique molecular identifiers) and three tools for resolving amplicons sequence variants (ASVs) (USEARCH, VSEARCH, and DADA2). We investigated the sensitivity of the sequencing techniques based on the number of detected mock taxa, and the accuracy of the different ASV-calling tools with a specific focus on the presence of chimera among the final rRNA operon ASVs. Based on our findings, we provide recommendations and best practice protocols for how to cost-effectively obtain essentially error-free rRNA operons in high-throughput. An agricultural soil sample was used to demonstrate that the sequencing and bioinformatic results from the mock community also translates to highly diverse natural samples, which enables us to identify previously undescribed microeukaryotic lineages.
短读代谢条形码在微真核生物分类中的应用受到缺乏全面的 18S rRNA 参考数据库的限制。尽管高通量长读测序的最新进展提供了极大增加这些数据库系统发育覆盖率的潜力,但不同测序技术的性能和随后的生物信息学处理仍有待评估,主要是因为缺乏定义明确的真核生物模拟群落。为了解决这一挑战,我们创建了一个真核 rRNA 操纵子克隆文库,并将其转化为一个精确定义的合成真核模拟群落。然后,我们使用这个模拟群落来评估三种长读测序策略(PacBio 环状一致测序和两种使用独特分子标识符的 Nanopore 方法)和三种用于解决扩增子序列变异 (ASV) 的工具(USEARCH、VSEARCH 和 DADA2)的性能。我们根据检测到的模拟分类单元的数量来研究测序技术的灵敏度,并特别关注最终 rRNA 操纵子 ASV 中嵌合体的存在,来评估不同 ASV 调用工具的准确性。根据我们的发现,我们提供了有关如何经济高效地在高通量中获得基本上无错误的 rRNA 操纵子的建议和最佳实践方案。我们使用农业土壤样本证明了模拟群落的测序和生物信息学结果也适用于高度多样化的自然样本,这使我们能够鉴定以前未描述的微真核生物谱系。