Chongqing Jiaotong University, Chongqing, 400074, China; Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China.
Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China; Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China.
Water Res. 2023 Nov 1;246:120686. doi: 10.1016/j.watres.2023.120686. Epub 2023 Sep 30.
Effective and standardized monitoring methodologies are vital for successful reservoir restoration and management. Environmental DNA (eDNA) metabarcoding sequencing offers a promising alternative for biomonitoring and can overcome many limitations of traditional morphological bioassessment. Recent attempts have even shown that supervised machine learning (SML) can directly infer biotic indices (BI) from eDNA metabarcoding data, bypassing the cumbersome calculation process of BI regardless of the taxonomic assignment of eDNA sequences. However, questions surrounding the general applicability of this taxonomy-free approach to monitoring reservoir health remain unclear, including model stability, feature selection, algorithm choice, and multi-season biomonitoring. Here, we firstly developed a novel biological integrity index (Me-IBI) that integrates multitrophic interactions and environmental information, based on taxonomy-assigned eDNA metabarcoding data. The Me-IBI can better distinguish the actual health status of the Three Gorges Reservoir (TGR) than physicochemical assessments and have a clear response to human activity. Then, taking this reliable Me-IBI as a supervised label, we compared the impact of selecting different numbers of features and SML algorithms on the stability and predictive performance of the model for predicting ecological conditions in multiple seasons using taxonomy-free eDNA metabarcoding data. We discovered that even with a small number of features, different SML algorithms can establish a stable model and obtain excellent predictive performance. Finally, we proposed a four-step strategy for standardized routine biomonitoring using SML tools. Our study firstly explores the general applicability problem of the taxonomy-free eDNA-SML approach and establishes a solid foundation for the large-scale and standardized biomonitoring application.
有效的和标准化的监测方法对于成功的水库恢复和管理至关重要。环境 DNA(eDNA)宏条形码测序为生物监测提供了一种有前途的替代方法,可以克服传统形态生物评估的许多局限性。最近的尝试甚至表明,有监督的机器学习(SML)可以直接从 eDNA 宏条形码数据推断生物指数(BI),而无需考虑 eDNA 序列的分类分配,从而绕过 BI 的繁琐计算过程。然而,这种无需分类的方法在监测水库健康方面的普遍适用性仍存在一些问题,包括模型稳定性、特征选择、算法选择和多季节生物监测。在这里,我们首先基于分类分配的 eDNA 宏条形码数据,开发了一种新的生物完整性指数(Me-IBI),该指数整合了多营养级相互作用和环境信息。Me-IBI 可以比理化评估更好地区分三峡水库(TGR)的实际健康状况,并且对人类活动有明显的反应。然后,以这种可靠的 Me-IBI 作为监督标签,我们比较了选择不同数量的特征和 SML 算法对模型稳定性和预测性能的影响,该模型用于使用无分类的 eDNA 宏条形码数据预测多个季节的生态条件。我们发现,即使使用少量特征,不同的 SML 算法也可以建立稳定的模型并获得出色的预测性能。最后,我们提出了使用 SML 工具进行标准化常规生物监测的四步策略。我们的研究首次探讨了无分类的 eDNA-SML 方法的普遍适用性问题,并为大规模和标准化生物监测应用奠定了坚实的基础。