使用带有掩码自动编码器预训练的变压器模型对超声图像中的甲状腺结节进行分割。

Thyroid nodule segmentation in ultrasound images using transformer models with masked autoencoder pre-training.

作者信息

Xiang Yi, Acharya Rajendra, Le Quan, Tan Jen Hong, Chng Chiaw-Ling

机构信息

Office of Insights & Analytics, Division of Digital Strategy, SingHealth, Singapore, Singapore.

School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield Central, QLD, Australia.

出版信息

Front Artif Intell. 2025 Jul 24;8:1618426. doi: 10.3389/frai.2025.1618426. eCollection 2025.

DOI:10.3389/frai.2025.1618426

PMID:40777517

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12328416/

Abstract

INTRODUCTION

Thyroid nodule segmentation in ultrasound (US) images is a valuable yet challenging task, playing a critical role in diagnosing thyroid cancer. The difficulty arises from factors such as the absence of prior knowledge about the thyroid region, low contrast between anatomical structures, and speckle noise, all of which obscure boundary detection and introduce variability in nodule appearance across different images.

METHODS

To address these challenges, we propose a transformer-based model for thyroid nodule segmentation. Unlike traditional convolutional neural networks (CNNs), transformers capture global context from the first layer, enabling more comprehensive image representation, which is crucial for identifying subtle nodule boundaries. In this study, We first pre-train a Masked Autoencoder (MAE) to reconstruct masked patches, then fine-tune on thyroid US data, and further explore a cross-attention mechanism to enhance information flow between encoder and decoder.

RESULTS

Our experiments on the public AIMI, TN3K, and DDTI datasets show that MAE pre-training accelerates convergence. However, overall improvements are modest: the model achieves Dice Similarity Coefficient (DSC) scores of 0.63, 0.64, and 0.65 on AIMI, TN3K, and DDTI, respectively, highlighting limitations under small-sample conditions. Furthermore, adding cross-attention did not yield consistent gains, suggesting that data volume and diversity may be more critical than additional architectural complexity.

DISCUSSION

MAE pre-training notably reduces training time and helps themodel learn transferable features, yet overall accuracy remains constrained by limited data and nodule variability. Future work will focus on scaling up data, pre-training cross-attention layers, and exploring hybrid architectures to further boost segmentation performance.

摘要

引言

超声（US）图像中的甲状腺结节分割是一项有价值但具有挑战性的任务，在甲状腺癌诊断中起着关键作用。困难源于诸如缺乏关于甲状腺区域的先验知识、解剖结构之间的低对比度以及斑点噪声等因素，所有这些都会模糊边界检测，并在不同图像中引入结节外观的变异性。

方法

为应对这些挑战，我们提出了一种基于Transformer的甲状腺结节分割模型。与传统卷积神经网络（CNN）不同，Transformer从第一层开始捕捉全局上下文，能够实现更全面的图像表示，这对于识别细微的结节边界至关重要。在本研究中，我们首先预训练一个掩码自动编码器（MAE）以重建掩码补丁，然后在甲状腺超声数据上进行微调，并进一步探索交叉注意力机制以增强编码器和解码器之间的信息流。

结果

我们在公共AIMI、TN3K和DDTI数据集上的实验表明，MAE预训练加速了收敛。然而，总体改进幅度不大：该模型在AIMI、TN3K和DDTI上分别实现了0.63、0.64和0.65的骰子相似系数（DSC）分数，突出了小样本条件下的局限性。此外，添加交叉注意力并没有带来一致的收益，这表明数据量和多样性可能比额外的架构复杂性更关键。

讨论

MAE预训练显著减少了训练时间，并帮助模型学习可迁移特征，但总体准确性仍然受到有限数据和结节变异性的限制。未来的工作将集中在扩大数据规模、预训练交叉注意力层以及探索混合架构以进一步提高分割性能。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用带有掩码自动编码器预训练的变压器模型对超声图像中的甲状腺结节进行分割。

Thyroid nodule segmentation in ultrasound images using transformer models with masked autoencoder pre-training.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

方法

结果

讨论

相似文献

本文引用的文献

使用带有掩码自动编码器预训练的变压器模型对超声图像中的甲状腺结节进行分割。

Thyroid nodule segmentation in ultrasound images using transformer models with masked autoencoder pre-training.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

方法

结果

讨论

相似文献

本文引用的文献