Jia Yangyang, Zhao Shengguo, Guo Wenjie, Peng Ling, Zhao Fang, Wang Lushan, Fan Guangyi, Zhu Yuanfang, Xu Dayou, Liu Guilin, Wang Ruoqing, Fang Xiaodong, Zhang He, Kristiansen Karsten, Zhang Wenwei, Chen Jianwei
BGI-Shenzhen, Shenzhen, 518083, China.
BGI-Qingdao, BGI-Shenzhen, Qingdao, 266555, China.
Environ Microbiome. 2022 Aug 17;17(1):43. doi: 10.1186/s40793-022-00436-y.
Increasing studies have demonstrated potential disproportionate functional and ecological contributions of rare taxa in a microbial community. However, the study of the microbial rare biosphere is hampered by their inherent scarcity and the deficiency of currently available techniques. Sample-wise cross contaminations might be introduced by sample index misassignment in the most widely used metabarcoding amplicon sequencing approach. Although downstream bioinformatic quality control and clustering or denoising algorithms could remove sequencing errors and non-biological artifact reads, no algorithm could eliminate high quality reads from sample-wise cross contaminations introduced by index misassignment, making it difficult to distinguish between bona fide rare taxa and potential false positives in metabarcoding studies.
We thoroughly evaluated the rate of index misassignment of the widely used NovaSeq 6000 and DNBSEQ-G400 sequencing platforms using both commercial and customized mock communities, and observed significant lower (0.08% vs. 5.68%) fraction of potential false positive reads for DNBSEQ-G400 as compared to NovaSeq 6000. Significant batch effects could be caused by stochastically introduced false positive or false negative rare taxa. These false detections could also lead to inflated alpha diversity of relatively simple microbial communities and underestimated that of complex ones. Further test using a set of cow rumen samples reported differential rare taxa by different sequencing platforms. Correlation analysis of the rare taxa detected by each sequencing platform demonstrated that the rare taxa identified by DNBSEQ-G400 platform had a much higher possibility to be correlated with the physiochemical properties of rumen fluid as compared to NovaSeq 6000 platform. Community assembly mechanism and microbial network correlation analysis indicated that false positive or negative rare taxa detection could lead to biased community assembly mechanism and identification of fake keystone species of the community.
We highly suggest proper positive/negative/blank controls, technical replicate settings, and proper sequencing platform selection in future amplicon studies, especially when the microbial rare biosphere would be focused.
越来越多的研究表明,稀有类群在微生物群落中可能具有不成比例的功能和生态贡献。然而,微生物稀有生物圈的研究受到其固有稀缺性和现有技术不足的阻碍。在最广泛使用的元条形码扩增子测序方法中,样本索引错误分配可能会引入样本间的交叉污染。尽管下游生物信息学质量控制以及聚类或去噪算法可以去除测序错误和非生物学伪影读数,但没有算法能够消除由索引错误分配引入的样本间交叉污染产生的高质量读数,这使得在元条形码研究中难以区分真正的稀有类群和潜在的假阳性。
我们使用商业和定制的模拟群落全面评估了广泛使用的NovaSeq 6000和DNBSEQ-G400测序平台的索引错误分配率,并且观察到与NovaSeq 6000相比,DNBSEQ-G400潜在假阳性读数的比例显著更低(0.08%对5.68%)。随机引入的假阳性或假阴性稀有类群可能会导致显著的批次效应。这些错误检测还可能导致相对简单的微生物群落的α多样性虚高,而复杂群落的α多样性被低估。使用一组奶牛瘤胃样本进行的进一步测试报告了不同测序平台检测到的稀有类群存在差异。对每个测序平台检测到的稀有类群进行的相关性分析表明,与NovaSeq 6000平台相比,DNBSEQ-G400平台鉴定出的稀有类群与瘤胃液理化性质相关的可能性要高得多。群落组装机制和微生物网络相关性分析表明,假阳性或假阴性稀有类群检测可能会导致群落组装机制出现偏差,并识别出群落中的假关键物种。
我们强烈建议在未来的扩增子研究中进行适当的阳性/阴性/空白对照、技术重复设置以及适当的测序平台选择,特别是当关注微生物稀有生物圈时。