Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072, China.
Department of Ultrasound, Zhongnan Hospital, Wuhan University, Wuhan, 430072, China.
BMC Bioinformatics. 2023 Aug 19;24(1):315. doi: 10.1186/s12859-023-05446-2.
Two types of non-invasive, radiation-free, and inexpensive imaging technologies that are widely employed in medical applications are ultrasound (US) and infrared thermography (IRT). The ultrasound image obtained by ultrasound imaging primarily expresses the size, shape, contour boundary, echo, and other morphological information of the lesion, while the infrared thermal image obtained by infrared thermography imaging primarily describes its thermodynamic function information. Although distinguishing between benign and malignant thyroid nodules requires both morphological and functional information, present deep learning models are only based on US images, making it possible that some malignant nodules with insignificant morphological changes but significant functional changes will go undetected.
Given the US and IRT images present thyroid nodules through distinct modalities, we proposed an Adaptive multi-modal Hybrid (AmmH) classification model that can leverage the amalgamation of these two image types to achieve superior classification performance. The AmmH approach involves the construction of a hybrid single-modal encoder module for each modal data, which facilitates the extraction of both local and global features by integrating a CNN module and a Transformer module. The extracted features from the two modalities are then weighted adaptively using an adaptive modality-weight generation network and fused using an adaptive cross-modal encoder module. The fused features are subsequently utilized for the classification of thyroid nodules through the use of MLP. On the collected dataset, our AmmH model respectively achieved 97.17% and 97.38% of F1 and F2 scores, which significantly outperformed the single-modal models. The results of four ablation experiments further show the superiority of our proposed method.
The proposed multi-modal model extracts features from various modal images, thereby enhancing the comprehensiveness of thyroid nodules descriptions. The adaptive modality-weight generation network enables adaptive attention to different modalities, facilitating the fusion of features using adaptive weights through the adaptive cross-modal encoder. Consequently, the model has demonstrated promising classification performance, indicating its potential as a non-invasive, radiation-free, and cost-effective screening tool for distinguishing between benign and malignant thyroid nodules. The source code is available at https://github.com/wuliZN2020/AmmH .
两种广泛应用于医学领域的非侵入性、无辐射且廉价的成像技术是超声(US)和红外热成像(IRT)。超声成像是通过获取超声图像来主要表达病变的大小、形状、轮廓边界、回声等形态学信息,而红外热成像则是通过获取红外热图像来主要描述病变的热力学功能信息。尽管区分良恶性甲状腺结节需要形态学和功能学信息,但目前的深度学习模型仅基于 US 图像,可能会遗漏一些形态学变化不明显但功能变化显著的恶性结节。
鉴于 US 和 IRT 图像通过不同模态呈现甲状腺结节,我们提出了一种自适应多模态混合(AmmH)分类模型,该模型可以利用两种图像类型的融合来实现更好的分类性能。AmmH 方法涉及为每种模态数据构建一个混合单模态编码器模块,该模块通过集成 CNN 模块和 Transformer 模块,促进了局部和全局特征的提取。然后,使用自适应模态权重生成网络自适应地加权来自两种模态的提取特征,并使用自适应跨模态编码器模块融合这些特征。融合后的特征随后通过 MLP 用于甲状腺结节的分类。在收集的数据集上,我们的 AmmH 模型分别在 F1 和 F2 得分上达到了 97.17%和 97.38%,显著优于单模态模型。四个消融实验的结果进一步证明了我们方法的优越性。
所提出的多模态模型从各种模态图像中提取特征,从而增强了甲状腺结节描述的全面性。自适应模态权重生成网络能够自适应地关注不同模态,通过自适应跨模态编码器以自适应权重融合特征。因此,该模型表现出了有前途的分类性能,表明它作为一种非侵入性、无辐射且经济有效的良恶性甲状腺结节筛查工具具有潜力。源代码可在 https://github.com/wuliZN2020/AmmH 上获得。