Stursa Dominik, Rozsival Pavel, Dolezel Petr
Faculty of Electrical Engineering and Informatics, University of Pardubice, Pardubice, Czechia.
Front Artif Intell. 2024 Dec 13;7:1456844. doi: 10.3389/frai.2024.1456844. eCollection 2024.
A novel methodology for dataset augmentation in the semantic segmentation of coil-coated surface degradation is presented in this study. Deep convolutional generative adversarial networks (DCGAN) are employed to generate synthetic input-target pairs, which closely resemble real-world data, with the goal of expanding an existing dataset. These augmented datasets are used to train two state-of-the-art models, U-net, and DeepLabV3, for the precise detection of degradation areas around scribes. In a series of experiments, it was demonstrated that the introduction of synthetic data improves the models' performance in detecting degradation, especially when the ratio of synthetic to real data is carefully managed. Results indicate that optimal improvements in accuracy and F1-score are achieved when the ratio of synthetic to original data is between 0.2 and 0.5. Moreover, the advantages and limitations of different GAN architectures for dataset expansion are explored, with specific attention to their ability to produce realistic and diverse samples. This work offers a scalable solution to the challenges associated with creating large and diverse annotated datasets for industrial applications of coil coating degradation assessment. The proposed approach provides a significant contribution by improving model generalization and segmentation accuracy while reducing the burden of manual data annotation. These findings have important implications for industries relying on coil coatings, as more efficient and accurate degradation detection methods are enabled.
本研究提出了一种用于卷材涂层表面降解语义分割中数据集扩充的新方法。采用深度卷积生成对抗网络(DCGAN)来生成与真实世界数据极为相似的合成输入-目标对,以扩充现有数据集。这些扩充后的数据集用于训练两种先进模型——U-net和DeepLabV3,用于精确检测划痕周围的降解区域。在一系列实验中,结果表明引入合成数据可提高模型在检测降解方面的性能,尤其是在仔细控制合成数据与真实数据的比例时。结果表明,当合成数据与原始数据的比例在0.2至0.5之间时,可实现精度和F1分数的最佳提升。此外,还探讨了不同GAN架构在数据集扩充方面的优缺点,特别关注它们生成逼真且多样样本的能力。这项工作为卷材涂层降解评估工业应用中创建大型多样标注数据集所面临的挑战提供了一种可扩展的解决方案。所提出的方法通过提高模型泛化能力和分割精度,同时减轻人工数据标注负担,做出了重大贡献。这些发现对依赖卷材涂层的行业具有重要意义,因为实现了更高效、准确的降解检测方法。