Research Center for Medical AI, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China.
Int J Comput Assist Radiol Surg. 2024 Oct;19(10):2001-2009. doi: 10.1007/s11548-024-03230-y. Epub 2024 Jul 14.
Differentiating pulmonary lymphoma from lung infections using CT images is challenging. Existing deep neural network-based lung CT classification models rely on 2D slices, lacking comprehensive information and requiring manual selection. 3D models that involve chunking compromise image information and struggle with parameter reduction, limiting performance. These limitations must be addressed to improve accuracy and practicality.
We propose a transformer sequential feature encoding structure to integrate multi-level information from complete CT images, inspired by the clinical practice of using a sequence of cross-sectional slices for diagnosis. We incorporate position encoding and cross-level long-range information fusion modules into the feature extraction CNN network for cross-sectional slices, ensuring high-precision feature extraction.
We conducted comprehensive experiments on a dataset of 124 patients, with respective sizes of 64, 20 and 40 for training, validation and testing. The results of ablation experiments and comparative experiments demonstrated the effectiveness of our approach. Our method outperforms existing state-of-the-art methods in the 3D CT image classification problem of distinguishing between lung infections and pulmonary lymphoma, achieving an accuracy of 0.875, AUC of 0.953 and F1 score of 0.889.
The experiments verified that our proposed position-enhanced transformer-based sequential feature encoding model is capable of effectively performing high-precision feature extraction and contextual feature fusion in the lungs. It enhances the ability of a standalone CNN network or transformer to extract features, thereby improving the classification performance. The source code is accessible at https://github.com/imchuyu/PTSFE .
利用 CT 图像区分肺淋巴瘤和肺部感染具有挑战性。现有的基于深度神经网络的肺部 CT 分类模型依赖于 2D 切片,缺乏全面的信息,需要手动选择。涉及切块的 3D 模型会损害图像信息,并且在参数减少方面存在困难,从而限制了性能。为了提高准确性和实用性,必须解决这些限制。
我们提出了一种基于 Transformer 的序列特征编码结构,该结构受临床使用一系列横截面切片进行诊断的启发,从完整的 CT 图像中整合多层次信息。我们将位置编码和跨层长程信息融合模块纳入用于横截面切片的特征提取 CNN 网络中,以确保高精度的特征提取。
我们在一个包含 124 名患者的数据集上进行了全面的实验,训练集、验证集和测试集的大小分别为 64、20 和 40。消融实验和对比实验的结果证明了我们方法的有效性。我们的方法在区分肺部感染和肺淋巴瘤的 3D CT 图像分类问题上优于现有的最先进方法,准确率为 0.875,AUC 为 0.953,F1 得分为 0.889。
实验验证了我们提出的基于位置增强的 Transformer 序列特征编码模型能够有效地在肺部进行高精度特征提取和上下文特征融合。它增强了独立 CNN 网络或 Transformer 提取特征的能力,从而提高了分类性能。源代码可在 https://github.com/imchuyu/PTSFE 获得。