School of Computer Engineering, KIIT Deemed to be University, Odisha, India.
Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, NTR District, Andhra Pradesh, India.
Biomed Phys Eng Express. 2024 Nov 6;11(1). doi: 10.1088/2057-1976/ad8c46.
This research presents an integrated framework designed to automate the classification of pulmonary chest x-ray images. Leveraging convolutional neural networks (CNNs) with a focus on transformer architectures, the aim is to improve both the accuracy and efficiency of pulmonary chest x-ray image analysis. A central aspect of this approach involves utilizing pre-trained networks such as VGG16, ResNet50, and MobileNetV2 to create a feature ensemble. A notable innovation is the adoption of a stacked ensemble technique, which combines outputs from multiple pre-trained models to generate a comprehensive feature representation. In the feature ensemble approach, each image undergoes individual processing through the three pre-trained networks, and pooled images are extracted just before the flatten layer of each model. Consequently, three pooled images in 2D grayscale format are obtained for each original image. These pooled images serve as samples for creating 3D images resembling RGB images through stacking, intended for classifier input in subsequent analysis stages. By incorporating stacked pooling layers to facilitate feature ensemble, a broader range of features is utilized while effectively managing complexities associated with processing the augmented feature pool. Moreover, the study incorporates the Swin Transformer architecture, known for effectively capturing both local and global features. The Swin Transformer architecture is further optimized using the artificial hummingbird algorithm (AHA). By fine-tuning hyperparameters such as patch size, multi-layer perceptron (MLP) ratio, and channel numbers, the AHA optimization technique aims to maximize classification accuracy. The proposed integrated framework, featuring the AHA-optimized Swin Transformer classifier utilizing stacked features, is evaluated using three diverse chest x-ray datasets-VinDr-CXR, PediCXR, and MIMIC-CXR. The observed accuracies of 98.874%, 98.528%, and 98.958% respectively, underscore the robustness and generalizability of the developed model across various clinical scenarios and imaging conditions.
这项研究提出了一个集成框架,旨在实现肺部 X 光图像的自动分类。该框架利用卷积神经网络(CNN),重点关注变压器架构,旨在提高肺部 X 光图像分析的准确性和效率。该方法的一个核心方面涉及利用预训练网络,如 VGG16、ResNet50 和 MobileNetV2,创建一个特征集成。一个显著的创新是采用堆叠集成技术,该技术将来自多个预训练模型的输出结合起来,生成一个全面的特征表示。在特征集成方法中,每个图像都通过三个预训练网络进行单独处理,并在每个模型的扁平化层之前提取池化图像。因此,每个原始图像都获得了三个 2D 灰度格式的池化图像。这些池化图像作为样本,通过堆叠创建类似于 RGB 图像的 3D 图像,用于后续分析阶段的分类器输入。通过引入堆叠池化层来促进特征集成,可以利用更广泛的特征,同时有效管理与处理增强特征池相关的复杂性。此外,该研究还采用了 Swin Transformer 架构,该架构擅长有效捕获局部和全局特征。Swin Transformer 架构进一步使用人工蜂群算法(AHA)进行优化。通过微调超参数,如补丁大小、多层感知机(MLP)比例和通道数量,AHA 优化技术旨在最大化分类准确性。该研究提出的集成框架采用了 AHA 优化的 Swin Transformer 分类器,利用堆叠特征,使用三个不同的胸部 X 光数据集-VinDr-CXR、PediCXR 和 MIMIC-CXR 进行了评估。分别观察到 98.874%、98.528%和 98.958%的准确率,这表明该开发模型在各种临床场景和成像条件下具有稳健性和通用性。