Ochoa-Ornelas Raquel, Gudiño-Ochoa Alberto, García-Rodríguez Julio Alberto
Systems and Computation Department, Tecnológico Nacional de México/Instituto Tecnológico de Ciudad Guzmán, Ciudad Guzmán 49100, Mexico.
Electronics Department, Tecnológico Nacional de México/Instituto Tecnológico de Ciudad Guzmán, Ciudad Guzmán 49100, Mexico.
Cancers (Basel). 2024 Nov 11;16(22):3791. doi: 10.3390/cancers16223791.
Lung and colon cancers are among the most prevalent and lethal malignancies worldwide, underscoring the urgent need for advanced diagnostic methodologies. This study aims to develop a hybrid deep learning and machine learning framework for the classification of Colon Adenocarcinoma, Colon Benign Tissue, Lung Adenocarcinoma, Lung Benign Tissue, and Lung Squamous Cell Carcinoma from histopathological images.
Current approaches primarily rely on the LC25000 dataset, which, due to image augmentation, lacks the generalizability required for real-time clinical applications. To address this, Contrast Limited Adaptive Histogram Equalization (CLAHE) was applied to enhance image quality, and 1000 new images from the National Cancer Institute GDC Data Portal were introduced into the Colon Adenocarcinoma, Lung Adenocarcinoma, and Lung Squamous Cell Carcinoma classes, replacing augmented images to increase dataset diversity. A hybrid feature extraction model combining MobileNetV2 and EfficientNetB3 was optimized using the Grey Wolf Optimizer (GWO), resulting in the Lung and Colon histopathological classification technique (MEGWO-LCCHC). Cross-validation and hyperparameter tuning with Optuna were performed on various machine learning models, including XGBoost, LightGBM, and CatBoost.
The MEGWO-LCCHC technique achieved high classification accuracy, with the lightweight DNN model reaching 94.8%, LightGBM at 93.9%, XGBoost at 93.5%, and CatBoost at 93.3% on the test set.
The findings suggest that our approach enhances classification performance and offers improved generalizability for real-world clinical applications. The proposed MEGWO-LCCHC framework shows promise as a robust tool in cancer diagnostics, advancing the application of AI in oncology.
肺癌和结肠癌是全球最常见且致命的恶性肿瘤之一,这凸显了对先进诊断方法的迫切需求。本研究旨在开发一种混合深度学习和机器学习框架,用于从组织病理学图像中对结肠腺癌、结肠良性组织、肺腺癌、肺良性组织和肺鳞状细胞癌进行分类。
当前方法主要依赖LC25000数据集,由于图像增强,该数据集缺乏实时临床应用所需的通用性。为解决此问题,应用对比度受限自适应直方图均衡化(CLAHE)来提高图像质量,并将来自美国国家癌症研究所GDC数据门户的1000张新图像引入结肠腺癌、肺腺癌和肺鳞状细胞癌类别中,替换增强图像以增加数据集的多样性。使用灰狼优化器(GWO)对结合MobileNetV2和EfficientNetB3的混合特征提取模型进行优化,从而得到肺和结肠组织病理学分类技术(MEGWO-LCCHC)。对包括XGBoost、LightGBM和CatBoost在内的各种机器学习模型进行交叉验证和使用Optuna进行超参数调整。
MEGWO-LCCHC技术实现了高分类准确率,在测试集上,轻量级深度神经网络模型达到94.8%,LightGBM为93.9%,XGBoost为93.5%,CatBoost为93.3%。
研究结果表明,我们的方法提高了分类性能,并为实际临床应用提供了更好的通用性。所提出的MEGWO-LCCHC框架有望成为癌症诊断中的强大工具,推动人工智能在肿瘤学中的应用。