Laboratory of Parasitology, Veterinary Research Institute, Hellenic Agricultural Organization - DIMITRA, Thermi, Thessaloniki 57001, Greece.
Hephaestus Laboratory, School of Chemistry, Faculty of Sciences, Democritus University of Thrace, Kavala GR-65404, Greece.
Water Res. 2024 Sep 15;262:122110. doi: 10.1016/j.watres.2024.122110. Epub 2024 Jul 22.
Cryptosporidium and Giardia are important parasitic protozoa due to their zoonotic potential and impact on human health, and have often caused waterborne outbreaks of disease. Detection of (oo)cysts in water matrices is challenging and extremely costly, thus only few countries have legislated for regular monitoring of drinking water for their presence. Several attempts have been made trying to investigate the association between the presence of such (oo)cysts in waters with other biotic or abiotic factors, with inconclusive findings. In this regard, the aim of this study was the development of an holistic approach leveraging Machine Learning (ML) and eXplainable Artificial Intelligence (XAI) techniques, in order to provide empirical evidence related to the presence and prediction of Cryptosporidium oocysts and Giardia cysts in water samples. To meet this objective, we initially modelled the complex relationship between Cryptosporidium and Giardia (oo)cysts and a set of parasitological, microbiological, physicochemical and meteorological parameters via a model-agnostic meta-learner algorithm that provides flexibility regarding the selection of the ML model executing the fitting task. Based on this generic approach, a set of four well-known ML candidates were, empirically, evaluated in terms of their predictive capabilities. Then, the best-performed algorithms, were further examined through XAI techniques for gaining meaningful insights related to the explainability and interpretability of the derived solutions. The findings reveal that the Random Forest achieves the highest prediction performance when the objective is the prediction of both contamination and contamination intensity with Cryptosporidium oocysts in a given water sample, with meteorological/physicochemical and microbiological markers being informative, respectively. For the prediction of contamination with Giardia, the eXtreme Gradient Boosting with physicochemical parameters was the most efficient algorithm, while, the Support Vector Regression that takes into consideration both microbiological and meteorological markers was more efficient for evaluating the contamination intensity with cysts. The results of the study designate that the adoption of ML and XAI approaches can be considered as a valuable tool for unveiling the complicated correlation of the presence and contamination intensity with these zoonotic parasites that could constitute, in turn, a basis for the development of monitoring platforms and early warning systems for the prevention of waterborne disease outbreaks.
隐孢子虫和贾第鞭毛虫是重要的寄生虫原生动物,因为它们具有动物源性和对人类健康的影响,并且经常导致水传播疾病的爆发。在水基质中检测(oo)囊是具有挑战性的,并且极其昂贵,因此只有少数几个国家制定了法规,定期监测饮用水中是否存在这些寄生虫。已经进行了一些尝试,试图调查这些(oo)囊在水中与其他生物或非生物因素之间的存在的关联,但结果尚无定论。在这方面,本研究的目的是利用机器学习(ML)和可解释人工智能(XAI)技术,开发一种整体方法,提供与水中隐孢子虫囊和贾第鞭毛虫囊的存在和预测相关的经验证据。为了实现这一目标,我们最初通过一种与模型无关的元学习算法来模拟隐孢子虫和贾第鞭毛虫(oo)囊与一组寄生虫学、微生物学、物理化学和气象参数之间的复杂关系,该算法在执行拟合任务时提供了对 ML 模型选择的灵活性。基于这种通用方法,我们通过实证评估了一组四种知名的 ML 候选算法,以评估它们的预测能力。然后,通过 XAI 技术进一步检查表现最佳的算法,以获得与所得到的解决方案的可解释性和可解释性相关的有意义的见解。研究结果表明,当目标是预测给定水样中隐孢子虫囊的污染和污染强度时,随机森林的预测性能最高,气象/物理化学和微生物标记分别具有信息性。对于贾第鞭毛虫污染的预测,带有物理化学参数的极端梯度增强是最有效的算法,而同时考虑微生物和气象标记的支持向量回归则更有效地评估囊的污染强度。研究结果表明,采用 ML 和 XAI 方法可以被认为是揭示这些动物源性寄生虫存在和污染强度之间复杂相关性的一种有价值的工具,这反过来又可以为开发监测平台和水传播疾病爆发的预警系统提供基础。