Holicheva Angelina A, Kozlov Konstantin S, Boiko Daniil A, Kamanin Maxim S, Provotorova Daria V, Kolomoets Nikita I, Ananikov Valentine P
Tula State University, Lenin pr. 92, Tula, 300012, Russia.
Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow, 119991, Russia.
NPJ Biofilms Microbiomes. 2025 Jan 14;11(1):16. doi: 10.1038/s41522-025-00647-4.
Biofilms are critical for understanding environmental processes, developing biotechnology applications, and progressing in medical treatments of various infections. Nowadays, a key limiting factor for biofilm analysis is the difficulty in obtaining large datasets with fully annotated images. This study introduces a versatile approach for creating synthetic datasets of annotated biofilm images with employing deep generative modeling techniques, including VAEs, GANs, diffusion models, and CycleGAN. Synthetic datasets can significantly improve the training of computer vision models for automated biofilm analysis, as demonstrated with the application of Mask R-CNN detection model. The approach represents a key advance in the field of biofilm research, offering a scalable solution for generating high-quality training data and working with different strains of microorganisms at different stages of formation. Terabyte-scale datasets can be easily generated on personal computers. A web application is provided for the on-demand generation of biofilm images.
生物膜对于理解环境过程、开发生物技术应用以及推进各种感染的医学治疗至关重要。如今,生物膜分析的一个关键限制因素是难以获得带有完全注释图像的大型数据集。本研究介绍了一种通用方法,通过采用深度生成建模技术(包括变分自编码器、生成对抗网络、扩散模型和循环生成对抗网络)来创建带注释的生物膜图像合成数据集。合成数据集可以显著改进用于自动生物膜分析的计算机视觉模型的训练,如Mask R-CNN检测模型的应用所示。该方法代表了生物膜研究领域的一项关键进展,为生成高质量训练数据以及处理处于不同形成阶段的不同微生物菌株提供了一种可扩展的解决方案。在个人计算机上可以轻松生成太字节规模的数据集。还提供了一个网络应用程序,用于按需生成生物膜图像。