Wang Xuhui
College of Fine Arts, Sichuan University of Science & Engineering, Zigong City, 643000, China.
Sci Rep. 2025 May 24;15(1):18127. doi: 10.1038/s41598-025-01949-5.
This study aims to address the issues of accuracy and efficiency in sculpture image classification. Due to the diversity and complexity of sculpture images, traditional image processing algorithms perform poorly in capturing the sculptures' intricate shapes and structural features, resulting in suboptimal classification and recognition performance. To overcome this challenge, this study proposes an innovative image classification method that combines the ResNet50 model from the Deep Convolutional Neural Network (DCNN) with the K-means++ clustering algorithm. ResNet50 is chosen for its powerful feature extraction capabilities and outstanding performance in image classification tasks. At the same time, K-means++ is selected for its optimized initial centroid selection strategy, which enhances the stability and reliability of clustering. After the final convolutional layer of ResNet50, a self-attention module is added. This module learns and generates an attention map, which guides the model on which areas of the image to focus on in subsequent processing. ResNet50 includes residual blocks, each containing multiple convolutional layers and a skip connection, enabling the network to learn differences between inputs and outputs rather than directly learning outputs, thus improving performance. Initially, ResNet50 extracts feature vectors from original images, which are inputted into the K-means + + algorithm for clustering. K-means + + automatically partitions these feature vectors into different categories, achieving unsupervised image classification. The CMU-MINE architectural sculpture dataset is utilized in the experimental section, with ViT-Base, EfficientNet-B4, and ConvNeXt-Tiny as benchmarks to evaluate the proposed ResNet50 + K-means + + image classification approach. The final model achieves a loss value of 0.155 and a recall of 98.9%, significantly outperforming the other three models. In conclusion, performing feature point matching during three-dimensional reconstruction is crucial. This study employs a combined image classification method using the ResNet50 and K-means + + algorithm, optimizing the accuracy issues of traditional classification methods and achieving promising classification results.
本研究旨在解决雕塑图像分类中的准确性和效率问题。由于雕塑图像的多样性和复杂性,传统图像处理算法在捕捉雕塑的复杂形状和结构特征方面表现不佳,导致分类和识别性能欠佳。为克服这一挑战,本研究提出了一种创新的图像分类方法,该方法将深度卷积神经网络(DCNN)中的ResNet50模型与K-means++聚类算法相结合。选择ResNet50是因其强大的特征提取能力以及在图像分类任务中的出色表现。同时,选择K-means++是因其优化的初始质心选择策略,该策略增强了聚类的稳定性和可靠性。在ResNet50的最后一个卷积层之后,添加了一个自注意力模块。该模块学习并生成一个注意力图,该图指导模型在后续处理中关注图像的哪些区域。ResNet50包括残差块,每个残差块包含多个卷积层和一个跳跃连接,使网络能够学习输入和输出之间的差异而不是直接学习输出,从而提高性能。最初,ResNet50从原始图像中提取特征向量,这些特征向量被输入到K-means++算法中进行聚类。K-means++自动将这些特征向量划分为不同类别,实现无监督图像分类。实验部分使用了CMU-MINE建筑雕塑数据集,以ViT-Base、EfficientNet-B4和ConvNeXt-Tiny作为基准来评估所提出的ResNet50+K-means++图像分类方法。最终模型实现了0.155的损失值和98.9%的召回率,显著优于其他三个模型。总之,在三维重建过程中进行特征点匹配至关重要。本研究采用了一种结合ResNet50和K-means++算法的图像分类方法,优化了传统分类方法的准确性问题,并取得了有前景的分类结果。