Suppr超能文献

用于牙科X光片中多模态病变分类的可解释CNN-放射组学融合与集成学习

Explainable CNN-Radiomics Fusion and Ensemble Learning for Multimodal Lesion Classification in Dental Radiographs.

作者信息

Can Zuhal, Aydin Emre

机构信息

Computer Engineering Department, Engineering and Architecture Faculty, Eskisehir Osmangazi University, Eskisehir 26040, Türkiye.

出版信息

Diagnostics (Basel). 2025 Aug 9;15(16):1997. doi: 10.3390/diagnostics15161997.

Abstract

: Clinicians routinely rely on periapical radiographs to identify root-end disease, but interpretation errors and inconsistent readings compromise diagnostic accuracy. We, therefore, developed an explainable, multimodal AI framework that (i) fuses two data modalities, deep CNN embeddings and radiomic texture descriptors that are extracted only from lesion-relevant pixels selected by Grad-CAM, and (ii) makes every prediction transparent through dual-layer explainability (pixel-level Grad-CAM heatmaps + feature-level SHAP values). : A dataset of 2285 periapical radiographs was processed using six CNN architectures (EfficientNet-B1/B4/V2M/V2S, ResNet-50, Xception). For each image, a Grad-CAM heatmap generated from the penultimate layer of the CNN was thresholded to create a binary mask that delineated the region most responsible for the network's decision. Radiomic features (first-order, GLCM, GLRLM, GLDM, NGTDM, and shape2D) were then computed only within that mask, ensuring that handcrafted descriptors and learned embeddings referred to the same anatomic focus. The two feature streams were concatenated, optionally reduced by principal component analysis or SelectKBest, and fed to random forest or XGBoost classifiers; five-view test-time augmentation (TTA) was applied at inference. Pixel-level interpretability was provided by the original Grad-CAM, while SHAP quantified the contribution of each radiomic and deep feature to the final vote. : Raw CNNs achieved a ca. 52% accuracy and AUC values near 0.60. The multimodal fusion raised performance dramatically; the Xception + radiomics + random forest model achieved a 95.4% accuracy and an AUC of 0.9867, and adding TTA increased these to 96.3% and 0.9917, respectively. The top ensemble, Xception and EfficientNet-V2S fusion vectors classified with XGBoost under five-view TTA, reached a 97.16% accuracy and an AUC of 0.9914, with false-positive and false-negative rates of 4.6% and 0.9%, respectively. Grad-CAM heatmaps consistently highlighted periapical regions, while SHAP plots revealed that radiomic texture heterogeneity and high-level CNN features jointly contributed to correct classifications. : By tightly integrating CNN embeddings, mask-targeted radiomics, and a two-tiered explainability stack (Grad-CAM + SHAP), the proposed system delivers state-of-the-art lesion detection and a transparent technique, addressing both accuracy and trust.

摘要

临床医生通常依靠根尖片来识别根尖周疾病,但解读错误和读数不一致会影响诊断准确性。因此,我们开发了一个可解释的多模态人工智能框架,该框架(i)融合了两种数据模态,即深度卷积神经网络(CNN)嵌入和仅从通过梯度加权类激活映射(Grad-CAM)选择的与病变相关像素中提取的放射组学纹理描述符,并且(ii)通过双层可解释性(像素级Grad-CAM热图+特征级SHAP值)使每个预测都具有透明度。

使用六种CNN架构(EfficientNet-B1/B4/V2M/V2S、ResNet-50、Xception)处理了一个包含2285张根尖片的数据集。对于每张图像,对从CNN倒数第二层生成的Grad-CAM热图进行阈值处理,以创建一个二值掩码,该掩码勾勒出对网络决策最负责的区域。然后仅在该掩码内计算放射组学特征(一阶、灰度共生矩阵(GLCM)、灰度行程长度矩阵(GLRLM)、灰度差异矩阵(GLDM)、邻域灰度差矩阵(NGTDM)和二维形状),确保手工制作的描述符和学习到的嵌入都指的是同一个解剖焦点。将这两个特征流连接起来,可选择通过主成分分析或SelectKBest进行降维,然后输入到随机森林或XGBoost分类器中;在推理时应用五视图测试时间增强(TTA)。原始的Grad-CAM提供像素级可解释性,而SHAP量化了每个放射组学和深度特征对最终投票的贡献。

原始的CNN达到了约52%的准确率和接近0.60的曲线下面积(AUC)值。多模态融合显著提高了性能;Xception+放射组学+随机森林模型达到了95.4%的准确率和0.9867的AUC,添加TTA后分别提高到96.3%和0.9917。最佳集成模型,即Xception和EfficientNet-V2S融合向量在五视图TTA下用XGBoost分类,达到了97.16%的准确率和0.9914的AUC,假阳性率和假阴性率分别为4.6%和0.9%。Grad-CAM热图始终突出显示根尖周区域,而SHAP图显示放射组学纹理异质性和高级CNN特征共同促成了正确分类。

通过紧密集成CNN嵌入、掩码靶向放射组学和双层可解释性堆栈(Grad-CAM+SHAP),所提出的系统提供了先进的病变检测和一种透明技术,兼顾了准确性和可信度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/500c/12385016/f7272f67b34f/diagnostics-15-01997-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验