使用具有迁移学习和随机抽样的BAT算法优化机器学习模型进行口腔癌检测。

Oral cavity carcinoma detection using BAT algorithm-optimized machine learning models with transfer learning and random sampling.

作者信息

Folorunso Sakinat O, Abdulwarith Akinshipo, Adeniyi Abidemi Emmanuel, Aworinde Halleluyah Oluwatobi, Awotunde Joseph Bamidele

机构信息

Artificial Intelligence Systems Research Group, Department of Computer Science, Olabisi Onabanjo University, Ago-Iwoye, Nigeria.

Department of Oral and Maxillofacial Pathology and Biology, Faculty of Dentistry, University of Lagos, Lagos, Nigeria.

出版信息

Comput Biol Med. 2025 Jun;192(Pt B):110250. doi: 10.1016/j.compbiomed.2025.110250. Epub 2025 May 5.

DOI:10.1016/j.compbiomed.2025.110250

PMID:40328028

Abstract

BACKGROUND

Oral cavity carcinoma remains a major public health concern, where early and accurate detection is vital for improving patient outcomes and survival rates. Current diagnostic systems often face challenges such as limited feature selection capabilities, imbalanced datasets, and computational inefficiencies.

METHODS

This study proposes a novel diagnostic framework TR-ROS-BAT-ML that integrates transfer learning, random sampling, and a BAT algorithm-based optimization strategy with ensemble machine learning classifiers. A dataset comprising 1224 hematoxylin and eosin (H&E)-stained histological images (at 100x and 400x magnifications) of normal oral epithelium and Oral Squamous Cell Carcinoma (OSCC) was collected from 230 patients using a Leica ICC50 HD microscopy camera. Pre-trained deep learning models (NANSNetLarge, EfficientNetB7, EfficientNetV2L, EfficientNetV2S, EfficientNetV2M) were employed for feature extraction. To address class imbalance, random oversampling techniques were applied. The BAT algorithm, inspired by bat echolocation behavior, was used for feature selection and hyperparameter tuning. Optimized features were classified using ensemble methods, including XGBoost, AdaBoost, Extra Trees (ET), Histogram-Based Gradient Boosting (HBGC), and MultiLayer Perceptron (MLP).

RESULTS

The proposed approach achieved high diagnostic performance across multiple model combinations. The best performance was recorded with the optimized ET model using random oversampling, achieving a recall of 0.992, demonstrating its efficacy in detecting oral lesions. In contrast, the combination of EfficientNetV2S + ROS + MLP yielded the lowest accuracy at 50.8 %. These results confirm the robustness of the TR-ROS-BAT-ML framework in handling imbalanced datasets and optimizing classification performance.

CONCLUSIONS

This study demonstrates the effectiveness of combining nature-inspired optimization, transfer learning, and ensemble machine learning for enhanced detection of oral cavity carcinoma. The proposed TR-ROS-BAT-ML framework offers a scalable, accurate, and efficient diagnostic tool with potential for real-time implementation. Future research will focus on integrating multi-modal data and further optimization to enhance its clinical applicability and impact in AI-driven healthcare solutions.

摘要

背景

口腔癌仍然是一个重大的公共卫生问题，早期准确检测对于改善患者预后和生存率至关重要。当前的诊断系统常常面临诸如特征选择能力有限、数据集不平衡以及计算效率低下等挑战。

方法

本研究提出了一种新颖的诊断框架TR-ROS-BAT-ML，该框架将迁移学习、随机采样以及基于BAT算法的优化策略与集成机器学习分类器相结合。使用徕卡ICC50 HD显微镜相机从230名患者那里收集了一个数据集，其中包括1224张苏木精和伊红（H&E）染色的正常口腔上皮和口腔鳞状细胞癌（OSCC）的组织学图像（放大倍数为100倍和400倍）。采用预训练的深度学习模型（NANSNetLarge、EfficientNetB7、EfficientNetV2L、EfficientNetV2S、EfficientNetV2M）进行特征提取。为了解决类别不平衡问题，应用了随机过采样技术。受蝙蝠回声定位行为启发的BAT算法用于特征选择和超参数调整。使用包括XGBoost、AdaBoost、Extra Trees（ET）、基于直方图的梯度提升（HBGC）和多层感知器（MLP）在内的集成方法对优化后的特征进行分类。

结果

所提出的方法在多种模型组合中均实现了较高的诊断性能。使用随机过采样的优化ET模型表现最佳，召回率达到0.992，证明了其在检测口腔病变方面的有效性。相比之下，EfficientNetV2S + ROS + MLP的组合准确率最低，为50.8%。这些结果证实了TR-ROS-BAT-ML框架在处理不平衡数据集和优化分类性能方面的稳健性。