Droste Richard, Cai Yifan, Sharma Harshita, Chatelain Pierre, Drukker Lior, Papageorghiou Aris T, Noble J Alison
Department of Engineering Science, University of Oxford, UK.
Nuffield Department of Women's & Reproductive Health, University of Oxford, UK.
Inf Process Med Imaging. 2019 Jun;26:592-604. doi: 10.1007/978-3-030-20351-1_46. Epub 2019 May 22.
Image representations are commonly learned from class labels, which are a simplistic approximation of human image understanding. In this paper we demonstrate that transferable representations of images can be learned without manual annotations by modeling human visual attention. The basis of our analyses is a unique gaze tracking dataset of sonographers performing routine clinical fetal anomaly screenings. Models of sonographer visual attention are learned by training a convolutional neural network (CNN) to predict gaze on ultrasound video frames through visual saliency prediction or gaze-point regression. We evaluate the transferability of the learned representations to the task of ultrasound standard plane detection in two contexts. Firstly, we perform transfer learning by fine-tuning the CNN with a limited number of labeled standard plane images. We find that fine-tuning the saliency predictor is superior to training from random initialization, with an average F1-score improvement of 9.6% overall and 15.3% for the cardiac planes. Secondly, we train a simple softmax regression on the feature activations of each CNN layer in order to evaluate the representations independently of transfer learning hyper-parameters. We find that the attention models derive strong representations, approaching the precision of a fully-supervised baseline model for all but the last layer.
图像表示通常是从类别标签中学习的,而类别标签是对人类图像理解的一种简单近似。在本文中,我们证明了通过对人类视觉注意力进行建模,可以在无需人工标注的情况下学习图像的可迁移表示。我们分析的基础是一个独特的注视跟踪数据集,该数据集来自进行常规临床胎儿异常筛查的超声医师。通过训练卷积神经网络(CNN)以通过视觉显著性预测或注视点回归来预测超声视频帧上的注视,从而学习超声医师视觉注意力模型。我们在两种情况下评估所学表示对超声标准平面检测任务的可迁移性。首先,我们通过使用有限数量的带标签的标准平面图像对CNN进行微调来执行迁移学习。我们发现,微调显著性预测器优于从随机初始化开始训练,总体平均F1分数提高了9.6%,心脏平面提高了15.3%。其次,我们在每个CNN层的特征激活上训练一个简单的softmax回归,以便独立于迁移学习超参数来评估这些表示。我们发现注意力模型能够得出强大的表示,除最后一层外,所有层的精度都接近完全监督基线模型。