Vasan Vinod, Sridharan Naveen Venkatesh, Vaithiyanathan Sugumaran, Aghaei Mohammadreza
School of Mechanical Engineering (SMEC), Vellore Institute of Technology Chennai Campus, Vandalur Kelambakkam Road, Chennai, 600127, India.
Division of Operation and Maintenance Engineering, Luleå University of Technology, 97187, Luleå, Sweden.
Heliyon. 2024 Oct 3;10(19):e38498. doi: 10.1016/j.heliyon.2024.e38498. eCollection 2024 Oct 15.
This study proposes a vision transformer to detect visual defects on steel surfaces. The proposed approach utilizes an open-source image dataset to classify steel surface conditions into six fault categories namely, crazing, inclusion, rolled in, pitted surface, scratches and patches. The defect images are first subject to resizing and then fed into a vision transformer subject to different hyperparameter configurations to determine the most optimal setting to render highest classification performance. The performance of the model is evaluated for different hyperparameter configurations, and the most optimal configuration is examined using the associated confusion matrices. It was observed that the proposed model presents a high overall accuracy of 96.39 % for detection and classification of steel surface faults. The study presents a descriptive insight into the vision transformer architecture and in addition, compares the performance of the current model with the results of other approaches suggested for application in literature. Vision transformers can serve as standalone approaches and suitable alternatives to the widely used convolution neural networks (CNNs) by actuating complex defect detection and classification tasks in real-time, enabling efficient and robust condition monitoring of a wide range of defects.
本研究提出了一种视觉Transformer来检测钢表面的视觉缺陷。所提出的方法利用一个开源图像数据集将钢表面状况分为六个故障类别,即裂纹、夹杂物、压入物、麻点表面、划痕和斑块。缺陷图像首先进行尺寸调整,然后输入到具有不同超参数配置的视觉Transformer中,以确定呈现最高分类性能的最优设置。针对不同的超参数配置评估模型的性能,并使用相关的混淆矩阵检查最优配置。观察到所提出的模型在钢表面缺陷的检测和分类方面呈现出96.39%的高总体准确率。该研究对视觉Transformer架构进行了描述性洞察,此外,还将当前模型的性能与文献中建议应用的其他方法的结果进行了比较。视觉Transformer可以作为独立的方法,通过实时执行复杂的缺陷检测和分类任务,成为广泛使用的卷积神经网络(CNN)的合适替代方案,从而实现对各种缺陷的高效且稳健的状态监测。