Lim Timothy J Y, Delgado Yussi M Palacios, Lintern Anna, McCarthy David T, Henry Rebekah
Department of Civil & Environmental Engineering, Monash University, Clayton, VIC 3800, Australia.
School of Environmental Sciences, University of Guelph, Guelph, ON N1G 2W1, Canada.
Bioinform Adv. 2025 Apr 29;5(1):vbaf103. doi: 10.1093/bioadv/vbaf103. eCollection 2025.
Understanding the quality of the source library prior to undertaking library-dependent microbial source-tracking (MST) is an essential, but often overlooked, primary analysis step.
We propose an assessment approach to validate the quality of amplicon-derived faecal source libraries. This approach was demonstrated on a faecal source library consisting of 16S rRNA paired-end amplicon sequences, obtained from various animal types in Victoria, Australia. First, a leave-one-out (LOO) analysis was performed to assess the accuracy of source category groupings by identifying the number of samples incorrectly assigned to a different source category (i.e. animal type). Following a quality control procedure to decide retaining/removing/grouping incorrectly assigned samples, we then assessed if the sample sizes for each source type were sufficient to properly characterize the source fingerprints. Results from LOO demonstrated 15.5% of samples were incorrectly assigned, with high error rates in birds and wallabies within our source library. Increasing the sample size improved source identification accuracy. However, accuracy eventually plateaued in a source-specific manner. Importantly, this highlights the importance of conducting thorough assessments to understand the quality and limitations of the source library prior to library-dependent MST applications.
QIIME2 is available via https://qiime2.org/; SourceTracker v2.0.1 is available via https://github.com/caporaso-lab/sourcetracker2; Pipeline for LOO is available via https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/LOO; Pipeline for sample size assessment is available via https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/Source%20variability.
在进行依赖文库的微生物源追踪(MST)之前,了解源文库的质量是一个至关重要但经常被忽视的初步分析步骤。
我们提出了一种评估方法来验证扩增子衍生粪便源文库的质量。该方法在一个粪便源文库上得到了验证,该文库由从澳大利亚维多利亚州的各种动物类型获得的16S rRNA双端扩增子序列组成。首先,进行留一法(LOO)分析,通过识别错误分配到不同源类别(即动物类型)的样本数量来评估源类别分组的准确性。在经过质量控制程序以决定保留/去除/分组错误分配的样本之后,我们接着评估每种源类型的样本量是否足以正确表征源指纹。留一法的结果表明,15.5%的样本被错误分配,我们的源文库中鸟类和小袋鼠的错误率较高。增加样本量提高了源识别的准确性。然而,准确性最终以源特异性的方式趋于平稳。重要的是,这突出了在依赖文库的MST应用之前进行全面评估以了解源文库的质量和局限性的重要性。
QIIME2可通过https://qiime2.org/获取;SourceTracker v2.0.1可通过https://github.com/caporaso-lab/sourcetracker2获取;留一法的流程可通过https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/LOO获取;样本量评估的流程可通过https://github.com/MonashOWL/Bioinformatics-IlluminaMGI/tree/main/16S/Source%20variability获取。