Chang Shih-Fang, Wu Po-Yi, Tsai Ming-Chang, Tseng Vincent S, Wang Chi-Chih
Information and Communications Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan.
Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan.
Front Artif Intell. 2025 Jul 23;8:1618607. doi: 10.3389/frai.2025.1618607. eCollection 2025.
Abdominal ultrasonography is a primary diagnostic tool for evaluating medical conditions within the abdominal cavity. Accurate determination of the relative locations of intra-abdominal organs and lesions based on anatomical features in ultrasound images is essential in diagnostic sonography. Recognizing and extracting anatomical landmarks facilitates lesion evaluation and enhances diagnostic interpretation. Recent artificial intelligence (AI) segmentation methods employing deep neural networks (DNNs) and transformers encounter computational efficiency challenges to balance the preservation of feature dependencies information with model efficiency, limiting their clinical applicability.
The anatomical structure recognition framework, MaskHybrid, was developed using a private dataset comprising 34,711 abdominal ultrasound images of 2,063 patients from CSMUH. The dataset included abdominal organs and vascular structures (hepatic vein, inferior vena cava, portal vein, gallbladder, kidney, pancreas, spleen) and liver lesions (hepatic cyst, tumor). MaskHybrid adopted a mamba-transformer hybrid architecture consisting of an evolved backbone network, encoder, and corresponding decoder to capture long-range spatial dependencies and contextual information effectively, demonstrating improved image segmentation capabilities in visual tasks while mitigating the computational burden associated with the transformer-based attention mechanism.
Experiments on the retrospective dataset achieved a mean average precision (mAP) score of 74.13% for anatomical landmarks segmentation in abdominal ultrasound images. Our proposed framework outperformed baselines across most organ and lesion types and effectively segmented challenging anatomical structures. Moreover, MaskHybrid exhibited a significantly shorter inference time (0.120 ± 0.013 s), achieving 2.5 times faster than large-sized AI models of similar size. Combining Mamba and transformer architectures, this hybrid design was well-suited for the timely analysis of complex anatomical structures segmentation in abdominal ultrasonography, where accuracy and efficiency are critical in clinical practice.
The proposed mamba-transformer hybrid recognition framework simultaneously detects and segments multiple abdominal organs and lesions in ultrasound images, achieving superior segmentation accuracy, visualization effect, and inference efficiency, thereby facilitating improved medical image interpretation and near real-time diagnostic sonography that meets clinical needs.
腹部超声检查是评估腹腔内疾病的主要诊断工具。在超声诊断中,根据超声图像中的解剖特征准确确定腹内器官和病变的相对位置至关重要。识别和提取解剖标志有助于病变评估并增强诊断解读。最近采用深度神经网络(DNN)和变换器的人工智能(AI)分割方法在平衡特征依赖信息的保留与模型效率方面面临计算效率挑战,限制了它们的临床适用性。
解剖结构识别框架MaskHybrid是使用一个私有数据集开发的,该数据集包含来自中国医药大学附设医院的2063例患者的34711张腹部超声图像。该数据集包括腹部器官和血管结构(肝静脉、下腔静脉、门静脉、胆囊、肾脏、胰腺、脾脏)以及肝脏病变(肝囊肿、肿瘤)。MaskHybrid采用了曼巴变换器混合架构,由一个进化的骨干网络、编码器和相应的解码器组成,以有效捕捉远程空间依赖性和上下文信息,在视觉任务中展示出改进的图像分割能力,同时减轻与基于变换器的注意力机制相关的计算负担。
在回顾性数据集上进行的实验在腹部超声图像解剖标志分割方面取得了74.13%的平均平均精度(mAP)分数。我们提出的框架在大多数器官和病变类型上优于基线,并有效分割了具有挑战性的解剖结构。此外,MaskHybrid的推理时间显著更短(0.120±0.013秒),比类似规模的大型AI模型快2.5倍。这种混合设计结合了曼巴和变换器架构,非常适合对腹部超声检查中复杂解剖结构分割进行及时分析,在临床实践中准确性和效率至关重要。
所提出的曼巴变换器混合识别框架可同时在超声图像中检测和分割多个腹部器官和病变,实现卓越的分割精度、可视化效果和推理效率,从而有助于改进医学图像解读并实现满足临床需求的近实时超声诊断。