Zhou Tao, Ye Xinyu, Lu Huiling, Guo Yujie, Wang Hongxia, Liu Yang
School of Computer Science and Engineering, North Minzu University, Yinchuan, 750021, China.
Key Laboratory of Image and Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan, 750021, China.
Sci Rep. 2024 Dec 28;14(1):30719. doi: 10.1038/s41598-024-79786-1.
Multi-modal medical images are important in tumor lesion detection. However, the existing detection models only use single-modal to detect lesions, a multi-modal semantic correlation is not enough to consider and lacks ability to express the shape, size, and contrast degree features of lesions. A Cross Modal YOLOv5 model (CMYOLOv5) is proposed. Firstly, there are two networks, auxiliary network is consisted by dual-branch structure to extract semantic information from PET and CT, backbone network is consisted by YOLOv5 to extract semantic information from PET/CT. Secondly, Cross-modal Features Fusion (CFF) is designed in auxiliary network to fuse PET functional information and CT anatomical information. Self-Adaptive Attention Fusion (AAF) is designed in backbone network to fuse and enhance three-modal complementary information. Thirdly, Self-Adaptive Transformer (SAT) is designed in feature enhance neck. Using Transformer with deformable attention mechanism to focus on lung tumor region. Using MLP with channel attention mechanism to enhance features representation ability of lung tumor region. Finally, Reparameter Residual Block (RRB) and Reparameter Convolution operation (RC) are designed to fully learn richer PET, CT and PET/CT feature. Comparative experiments are conducted on clinical lung tumor PET/CT multi-modality dataset, the effectiveness of CMYOLOv5 is verified by Precision, Recall, mAP, F1, FPS, and training time, experimental results are 97.16%, 96.41%, 97.18%, 96.78%, 96.37 and 3912 s. CMYOLOv5 has high precision in the detection of irregular lung tumors, which is superior to the existing advanced methods.
多模态医学图像在肿瘤病变检测中具有重要意义。然而,现有的检测模型仅使用单模态来检测病变,对多模态语义相关性的考虑不足,并且缺乏表达病变形状、大小和对比度特征的能力。为此提出了一种跨模态YOLOv5模型(CMYOLOv5)。首先,该模型有两个网络,辅助网络由双分支结构组成,用于从PET和CT中提取语义信息,主干网络由YOLOv5组成,用于从PET/CT中提取语义信息。其次,在辅助网络中设计了跨模态特征融合(CFF),以融合PET功能信息和CT解剖信息。在主干网络中设计了自适应注意力融合(AAF),以融合和增强三模态互补信息。第三,在特征增强颈部设计了自适应Transformer(SAT)。使用具有可变形注意力机制的Transformer聚焦于肺肿瘤区域。使用具有通道注意力机制的MLP增强肺肿瘤区域的特征表示能力。最后,设计了重参数化残差块(RRB)和重参数化卷积操作(RC),以充分学习更丰富的PET、CT和PET/CT特征。在临床肺肿瘤PET/CT多模态数据集上进行了对比实验,通过精度、召回率、平均精度均值(mAP)、F1值、每秒帧数(FPS)和训练时间验证了CMYOLOv5的有效性,实验结果分别为97.16%、96.41%、97.18%、96.78%、96.37和3912秒。CMYOLOv5在不规则肺肿瘤检测中具有较高的精度,优于现有的先进方法。