Narasimha Raju Akella S, Venkatesh K, Rajababu M, Kumar Gatla Ranjith, Jakeer Hussain Shaik, Satya Mohan Chowdary G, Ganga Bhavani T, Kareemullah Mohammed, Algburi Sameer, Majdi Ali, Abdulhadi Ahmed M, Ahmad Khan Wahaj
Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamilnadu, 603203, India.
Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamilnadu, 603203, India.
BMC Med Imaging. 2025 Jul 15;25(1):283. doi: 10.1186/s12880-025-01826-7.
Colorectal cancer (CRC) is the second most common cause of cancer-related mortality worldwide, underscoring the necessity for computer-aided diagnosis (CADx) systems that are interpretable, accurate, and robust. This study presents a practical CADx system that combines Vision Transformers (ViTs) and DeepLabV3 + to accurately identify and segment colorectal lesions in colonoscopy images.The system addresses class balance and real-world complexity with PCA-based dimensionality reduction, data augmentation, and strategic preprocessing using recently curated CKHK-22 dataset comprising more than 14,000 annotated images of CVC-ClinicDB, Kvasir-2, and Hyper-Kvasir. ViT, ResNet-50, DenseNet-201, and VGG-16 were used to quantify classification performance. ViT achieved best-in-class accuracy (97%), F1-score (0.95), and AUC (92%) in test data. The DeepLabV3 + achieved segmentation state-of-the-art for tasks of localisation with 0.88 Dice Coefficient and 0.71 Intersection over Union (IoU), ensuring sharp delineation of areas that are malignant. The CADx system accommodates real-time inference and served through Google Cloud for information that accommodates scalable clinical implementation. The image-level segmentation effectiveness is evidenced by comparison with visual overlay and expert-manually deliminated masks, and its precision is illustrated by computation of precision, recall, F1-score, and AUC. The hybrid strategy not only outperforms traditional CNN strategies but also overcomes important clinical needs such as detection early, balance of highly disparate classes, and clear explanation. The proposed ViT-DeepLabV3 + system establishes a basis for advanced AI support to colorectal diagnosis by utilizing self-attention strategies and learning with different scales of context. The system offers a high-capacity, reproducible computerised colorectal cancer screening and monitoring solution and can be best deployed where resources are scarce, and it can be highly desirable for clinical deployment.
结直肠癌(CRC)是全球癌症相关死亡的第二大常见原因,这凸显了对可解释、准确且稳健的计算机辅助诊断(CADx)系统的需求。本研究提出了一种实用的CADx系统,该系统结合了视觉Transformer(ViT)和DeepLabV3 +,以准确识别和分割结肠镜检查图像中的结直肠病变。该系统通过基于主成分分析(PCA)的降维、数据增强以及使用最近整理的CKHK - 22数据集进行战略预处理来解决类别平衡和现实世界的复杂性问题,该数据集包含超过14000张来自CVC - ClinicDB、Kvasir - 2和Hyper - Kvasir的标注图像。使用ViT、ResNet - 50、DenseNet - 201和VGG - 16来量化分类性能。ViT在测试数据中实现了同类最佳的准确率(97%)、F1分数(0.95)和曲线下面积(AUC,92%)。DeepLabV3 +在定位任务中实现了分割的最先进水平,其骰子系数为0.88,交并比(IoU)为0.71,确保了对恶性区域的清晰描绘。该CADx系统支持实时推理,并通过谷歌云提供服务,以适应可扩展的临床应用。通过与视觉叠加和专家手动划定的掩码进行比较,证明了图像级分割的有效性,并通过计算精度、召回率、F1分数和AUC来说明其精度。这种混合策略不仅优于传统的卷积神经网络(CNN)策略,还克服了诸如早期检测、高度不平衡类别的平衡以及清晰解释等重要临床需求。所提出的ViT - DeepLabV3 +系统通过利用自注意力策略和不同尺度上下文的学习,为结直肠癌诊断的高级人工智能支持奠定了基础。该系统提供了一种高容量、可重复的计算机化结直肠癌筛查和监测解决方案,并且在资源稀缺的地方可以得到最佳部署,非常适合临床应用。