Department of Biomedical Engineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, College of Medicine, University of Ulsan, Seoul, Republic of Korea.
Department of Convergence Medicine, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea.
Med Image Anal. 2023 Oct;89:102894. doi: 10.1016/j.media.2023.102894. Epub 2023 Jul 12.
A major responsibility of radiologists in routine clinical practice is to read follow-up chest radiographs (CXRs) to identify changes in a patient's condition. Diagnosing meaningful changes in follow-up CXRs is challenging because radiologists must differentiate disease changes from natural or benign variations. Here, we suggest using a multi-task Siamese convolutional vision transformer (MuSiC-ViT) with an anatomy-matching module (AMM) to mimic the radiologist's cognitive process for differentiating baseline change from no-change. MuSiC-ViT uses the convolutional neural networks (CNNs) meet vision transformers model that combines CNN and transformer architecture. It has three major components: a Siamese network architecture, an AMM, and multi-task learning. Because the input is a pair of CXRs, a Siamese network was adopted for the encoder. The AMM is an attention module that focuses on related regions in the CXR pairs. To mimic a radiologist's cognitive process, MuSiC-ViT was trained using multi-task learning, normal/abnormal and change/no-change classification, and anatomy-matching. Among 406 K CXRs studied, 88 K change and 115 K no-change pairs were acquired for the training dataset. The internal validation dataset consisted of 1,620 pairs. To demonstrate the robustness of MuSiC-ViT, we verified the results with two other validation datasets. MuSiC-ViT respectively achieved accuracies and area under the receiver operating characteristic curves of 0.728 and 0.797 on the internal validation dataset, 0.614 and 0.784 on the first external validation dataset, and 0.745 and 0.858 on a second temporally separated validation dataset. All code is available at https://github.com/chokyungjin/MuSiC-ViT.
放射科医生在常规临床实践中的一项主要职责是阅读随访胸部 X 光片(CXR)以确定患者病情的变化。诊断随访 CXR 中的有意义变化具有挑战性,因为放射科医生必须区分疾病变化与自然或良性变化。在这里,我们建议使用具有解剖匹配模块(AMM)的多任务暹罗卷积视觉转换器(MuSiC-ViT)来模拟放射科医生区分基线变化与无变化的认知过程。MuSiC-ViT 使用卷积神经网络(CNN)和视觉转换器模型相结合的 meet vision transformers 模型。它有三个主要组成部分:暹罗网络架构、AMM 和多任务学习。由于输入是一对 CXR,因此采用暹罗网络作为编码器。AMM 是一个注意力模块,它关注 CXR 对中的相关区域。为了模拟放射科医生的认知过程,MuSiC-ViT 采用多任务学习、正常/异常和变化/无变化分类以及解剖匹配进行训练。在研究的 406K 张 CXR 中,获得了 88K 变化和 115K 无变化对用于训练数据集。内部验证数据集由 1620 对组成。为了证明 MuSiC-ViT 的稳健性,我们使用另外两个验证数据集验证了结果。MuSiC-ViT 在内部验证数据集上的准确率和接收者操作特征曲线下面积分别为 0.728 和 0.797,在第一个外部验证数据集上的准确率和曲线下面积分别为 0.614 和 0.784,在第二个时间分离验证数据集上的准确率和曲线下面积分别为 0.745 和 0.858。所有代码均可在 https://github.com/chokyungjin/MuSiC-ViT 获得。