Suppr超能文献

通过宏条形码技术研究环境样本中的真菌多样性:使用多种定义的模拟群落、不同分类方法和参考数据库评估ITS1和ITS2的Illumina测序

Investigating fungal diversity through metabarcoding for environmental samples: assessment of ITS1 and ITS2 Illumina sequencing using multiple defined mock communities with different classification methods and reference databases.

作者信息

Winand Raf, D'hooge Elizabet, Van Uffelen Alexander, Bogaerts Bert, Van Braekel Julien, Hoffman Stefan, Roosens Nancy H C J, Becker Pierre, De Keersmaecker Sigrid C J, Vanneste Kevin

机构信息

Transversal activities in Applied Genomics, Sciensano, 1050, Brussels, Belgium.

BCCM/IHEM collection, Mycology and Aerobiology, Sciensano, 1050, Brussels, Belgium.

出版信息

BMC Genomics. 2025 Aug 6;26(1):729. doi: 10.1186/s12864-025-11917-y.

Abstract

An important challenge in taxonomic classification of environmental samples is capturing the real diversity by identifying all species present in a sample. Metabarcoding approaches are often employed to identify species in complex samples. The internal transcribed spacer (ITS) region is the official, widely adopted, barcode for identifying fungal species. Metabarcoding can be done in many different ways with multiple choices at different steps of the workflow. We present a comparative evaluation of the sequenced region (ITS1 and/or ITS2), two different reference databases (UNITE versus BCCM/IHEM), two different bioinformatics software packages (BLAST versus mothur), and the considered taxonomic level (species versus genus level), to accurately capture the diversity using 37 fungal defined mock communities (DMCs). The DMCs cover a broad range of fungal diversity, including 42 Ascomycota species (26 genera), 4 Basidiomycota species (4 genera), and 5 Mucoromycota species (5 genera), all commonly found in indoor environments in Western Europe. Classification performance was first evaluated using ITS1 and ITS2 sequences of all species in the DMCs, generated by Sanger sequencing, to evaluate the discriminatory power of ITS and set a baseline for subsequent comparison with Illumina sequencing. Classification performance was found to be variable depending on all considered variables (sequencing technology, taxonomic level, ITS region, software, database) with 56-100% of species correctly assigned. Sanger sequencing showed that neither ITS1 nor ITS2 resulted in optimal performance due to its low discriminatory power within certain genera. Compared to Sanger sequencing, Illumina sequencing generally resulted in lower precision but comparable recall. Classification performance was generally good at genus but not at species level, although intermediate taxonomic levels could present adequate alternatives. ITS2 typically resulted in slightly better precision and comparable recall compared to ITS1. The employed reference database had a marked effect, with BCCM/IHEM performing better than UNITE due to the difference in number of sequences in each database. BLAST resulted in better performance, but required expert curation, whereas mothur performed better when using an automated workflow. Estimating species abundances using Illumina sequencing read counts generally performed only poorly, although read abundance filtering could increase the precision of ITS1, but not ITS2. Each approach comes with its own advantages and inconveniences and should be carefully selected based on the objectives of the analysis. Our results highlight the power of metabarcoding using Illumina sequencing for investigating fungal diversity in complex samples and can guide scientists in selecting the most appropriate setup for their own purposes.

摘要

环境样本分类学中的一个重要挑战是通过识别样本中存在的所有物种来捕捉真实的多样性。代谢条形码方法通常用于识别复杂样本中的物种。内转录间隔区(ITS)区域是用于识别真菌物种的官方、广泛采用的条形码。代谢条形码可以通过多种不同方式进行,在工作流程的不同步骤有多种选择。我们使用37个真菌定义的模拟群落(DMC),对测序区域(ITS1和/或ITS2)、两个不同的参考数据库(UNITE与BCCM/IHEM)、两个不同的生物信息学软件包(BLAST与mothur)以及所考虑的分类水平(物种水平与属水平)进行了比较评估,以准确捕捉多样性。这些DMC涵盖了广泛的真菌多样性,包括42种子囊菌(26个属)、4种担子菌(4个属)和5种毛霉门真菌(5个属),这些都是在西欧室内环境中常见的。首先使用通过桑格测序生成的DMC中所有物种的ITS1和ITS2序列评估分类性能,以评估ITS的鉴别能力,并为后续与Illumina测序的比较设定基线。发现分类性能因所有考虑的变量(测序技术、分类水平、ITS区域、软件、数据库)而异,正确分配的物种比例为56 - 100%。桑格测序表明,由于ITS1和ITS2在某些属内的鉴别能力较低,两者都未产生最佳性能。与桑格测序相比,Illumina测序通常导致较低的精度,但召回率相当。分类性能在属水平通常较好,但在物种水平则不然,尽管中间分类水平可能提供合适的替代方案。与ITS1相比,ITS2通常导致略高的精度和相当的召回率。所使用的参考数据库有显著影响,由于每个数据库中序列数量的差异,BCCM/IHEM的表现优于UNITE。BLAST的性能更好,但需要专家整理,而在使用自动化工作流程时,mothur的表现更好。使用Illumina测序读数计数估计物种丰度通常表现不佳,尽管读数丰度过滤可以提高ITS1的精度,但不能提高ITS2的精度。每种方法都有其自身的优点和不便之处,应根据分析目标仔细选择。我们的结果突出了使用Illumina测序进行代谢条形码分析以研究复杂样本中真菌多样性的能力,并可以指导科学家为自己的目的选择最合适的设置。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/933c/12329927/b00dc3a222e3/12864_2025_11917_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验