Struniawski Karol, Kozera Ryszard, Trzciński Paweł, Marasek-Ciołakowska Agnieszka, Sas-Paszt Lidia
Institute of Information Technology, Warsaw University of Life Sciences - SGGW, ul. Nowoursynowska 159, 02-776, Warsaw, Poland.
School of Physics, Mathematics and Computing, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, WA, 6009, Australia.
Sci Rep. 2024 Dec 28;14(1):31034. doi: 10.1038/s41598-024-82174-4.
The aim of this research is to create an automated system for identifying soil microorganisms at the genera level based on raw microscopic images of monocultural colonies grown in laboratory environment. The examined genera are: Fusarium, Trichoderma, Verticillium, Purpureolicillium and Phytophthora. The proposed pipeline deals with unprocessed microscopic images, avoiding additional sample marking or coloration. The methodology includes several stages: image preprocessing, segmenting images to isolate microorganisms from the background, calculating features related to image color and texture for classification. Using an extensive dataset of 2866 images from the National Institute of Horticultural Research in Skierniewice the Extreme Learning Machine model was trained and validated. The model showcases high accuracy and computational efficiency compared to other Machine Learning state-of-the art methods e.g. CatBoost, Random Forest or Convolutional Neural Networks. Statistical techniques, including Multivariate Analysis of Variance were employed to confirm significant differences among the datasets, enhancing the model's robustness. Nevertheless, Shapley Additive Explanations values provided transparency into the model's decision-making process. This approach has the potential to improve early detection and management of soil pathogens, promoting sustainable agriculture and demonstrating machine learning's potential in environmental monitoring, microbial ecology or industrial microbiology.
本研究的目的是创建一个自动化系统,用于基于实验室环境中生长的单菌落原始微观图像,在属水平上识别土壤微生物。所检测的属包括:镰刀菌属、木霉属、轮枝菌属、紫青霉属和疫霉属。所提出的流程处理未加工的微观图像,避免额外的样本标记或染色。该方法包括几个阶段:图像预处理、分割图像以从背景中分离微生物、计算与图像颜色和纹理相关的特征以进行分类。使用来自斯基尔尼维采国家园艺研究所的2866幅图像的广泛数据集对极限学习机模型进行了训练和验证。与其他机器学习的先进方法(如CatBoost、随机森林或卷积神经网络)相比,该模型展示了高精度和计算效率。采用了包括多变量方差分析在内的统计技术来确认数据集之间的显著差异,增强了模型的稳健性。尽管如此,夏普利值解释为模型的决策过程提供了透明度。这种方法有可能改善土壤病原体的早期检测和管理,促进可持续农业,并展示机器学习在环境监测、微生物生态学或工业微生物学中的潜力。