CVT-HNet：一种基于卷积神经网络（CNN）和视觉Transformer（ViT）的用于识别肛周瘘管性克罗恩病的融合模型。

CVT-HNet: a fusion model for recognizing perianal fistulizing Crohn's disease based on CNN and ViT.

作者信息

Li Lanlan, Wang Ziyue, Wang Chongyang, Chen Tao, Deng Ke, Wei Hong'an, Wang Dabiao, Li Juan, Zhang Heng

机构信息

College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China.

Fujian Key Laboratory for Intelligent Processing and Wireless Transmission of Media Information, Fuzhou University, Fuzhou, 350108, China.

出版信息

BMC Med Imaging. 2025 Jul 28;25(1):298. doi: 10.1186/s12880-025-01833-8.

DOI:10.1186/s12880-025-01833-8

PMID:40721771

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12305962/

Abstract

BACKGROUND

Accurate identification of anal fistulas is essential, as it directly impacts the severity of subsequent perianal infections, prognostic indicators, and overall treatment outcomes. Traditional manual recognition methods are inefficient. In response, computer vision methods have been adopted to improve efficiency. Convolutional neural networks(CNNs) are the main basis for detecting anal fistulas in current computer vision techniques. However, these methods often struggle to capture long-range dependencies effectively, which results in inadequate handling of images of anal fistulas.

METHODS

This study proposes a new fusion model, CVT-HNet, that integrates MobileNet with vision transformer technology. This design utilizes CNNs to extract local features and Transformers to capture long-range dependencies. In addition, the MobileNetV2 with Coordinate Attention mechanism and encoder modules are optimized to improve the precision of detecting anal fistulas.

RESULTS

Comparative experimental results show that CVT-HNet achieves an accuracy of 80.66% with significant robustness. It surpasses both pure Transformer architecture models and other fusion networks. Internal validation results demonstrate the reliability and consistency of CVT-HNet. External validation demonstrates that our model exhibits commendable transportability and generalizability. In visualization analysis, CVT-HNet exhibits a more concentrated focus on the region of interest in images of anal fistulas. Furthermore, the contribution of each CVT-HNet component module is evaluated by ablation experiments.

CONCLUSION

The experimental results highlight the superior performance and practicality of CVT-HNet in detecting anal fistulas. By combining local and global information, CVT-HNet demonstrates strong performance. The model not only achieves high accuracy and robustness but also exhibits strong generalizability. This makes it suitable for real-world applications where variability in data is common.These findings emphasize its effectiveness in clinical contexts.

摘要

背景

准确识别肛瘘至关重要，因为它直接影响后续肛周感染的严重程度、预后指标及整体治疗效果。传统的人工识别方法效率低下。为此，人们采用计算机视觉方法来提高效率。卷积神经网络（CNNs）是当前计算机视觉技术中检测肛瘘的主要基础。然而，这些方法往往难以有效捕捉长距离依赖关系，导致对肛瘘图像的处理不足。

方法

本研究提出一种新的融合模型CVT-HNet，它将MobileNet与视觉Transformer技术相结合。这种设计利用CNNs提取局部特征，利用Transformer捕捉长距离依赖关系。此外，对具有坐标注意力机制的MobileNetV2和编码器模块进行了优化，以提高肛瘘检测的精度。

结果

对比实验结果表明，CVT-HNet的准确率达到80.66%，具有显著的鲁棒性。它超越了纯Transformer架构模型和其他融合网络。内部验证结果证明了CVT-HNet的可靠性和一致性。外部验证表明，我们的模型具有良好的可迁移性和通用性。在可视化分析中，CVT-HNet对肛瘘图像感兴趣区域的关注更为集中。此外，通过消融实验评估了CVT-HNet各组件模块的贡献。