Mlaga Kodjovi D, Mathieu Alban, Beauparlant Charles Joly, Ott Alban, Khodr Ahmad, Perin Olivier, Droit Arnaud
Department of Molecular Medicine, Laval University, Quebec, QC, Canada.
Centre de Recherche du CHU de Québec, Quebec, QC, Canada.
Front Microbiol. 2021 May 5;12:640693. doi: 10.3389/fmicb.2021.640693. eCollection 2021.
The fungi ITS sequence length dissimilarity, non-specific amplicons, including chimaera formed during Polymerase Chain Reaction (PCR), added to sequencing errors, create bias during similarity clustering and abundance estimation in the downstream analysis. To overcome these challenges, we present a novel approach, Hierarchical Clustering with Kraken (HCK), to classify ITS1 amplicons and Abundance-Base Alternative Approach (ABAA) pipeline to detect and filter non-specific amplicons in fungi metabarcoding sequencing datasets.
We compared the performances of both pipelines against QIIME, KRAKEN, and DADA2 using publicly available fungi ITS mock community datasets and using BLASTn as a reference. We calculated the Precision, Recall, F-score using the True-Positive, False-positive, and False-negative estimation. Alpha diversity (Chao1 and Shannon metrics) was also used to evaluate the diversity estimation of our method.
The analysis shows that ABAA reduced the number of false-positive with all metabarcoding methods tested, and HCK increases precision and recall. HCK, coupled with ABAA, improves the F-score and bring alpha diversity metric value close to that of the BLASTn alpha diversity values when compared to QIIME, KRAKEN, and DADA2.
The developed HCK-ABAA approach allows better identification of the fungi community structures while avoiding use of a reference database for non-specific amplicons filtration. It results in a more robust and stable methodology over time. The software can be downloaded on the following link: https://bitbucket.org/GottySG36/hck/src/master/.
真菌ITS序列长度的差异、非特异性扩增子,包括在聚合酶链反应(PCR)过程中形成的嵌合体,再加上测序错误,会在下游分析的相似性聚类和丰度估计过程中产生偏差。为了克服这些挑战,我们提出了一种新方法,即使用Kraken的层次聚类(HCK)来对ITS1扩增子进行分类,以及丰度基础替代方法(ABAA)流程来检测和过滤真菌宏条形码测序数据集中的非特异性扩增子。
我们使用公开可用的真菌ITS模拟群落数据集,并以BLASTn作为参考,将这两个流程的性能与QIIME、KRAKEN和DADA2进行了比较。我们使用真阳性、假阳性和假阴性估计来计算精确率、召回率和F值。还使用了α多样性(Chao1和香农指标)来评估我们方法的多样性估计。
分析表明,ABAA减少了所有测试宏条形码方法中的假阳性数量,而HCK提高了精确率和召回率。与QIIME、KRAKEN和DADA相比,HCK与ABAA相结合提高了F值,并使α多样性指标值接近BLASTn的α多样性值。
所开发的HCK-ABAA方法能够更好地识别真菌群落结构,同时避免使用参考数据库进行非特异性扩增子过滤。随着时间的推移,它会产生一种更强大、更稳定的方法。该软件可通过以下链接下载:https://bitbucket.org/GottySG36/hck/src/master/ 。