Gao Riqiang, Huo Yuankai, Bao Shunxing, Tang Yucheng, Antic Sanja L, Epstein Emily S, Deppen Steve, Paulson Alexis B, Sandler Kim L, Massion Pierre P, Landman Bennett A
Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA 37235, Vanderbilt University Medical Center, Nashville, TN, USA 37235.
Neurocomputing (Amst). 2020 Jul 15;397:48-59. doi: 10.1016/j.neucom.2020.02.033. Epub 2020 Feb 15.
With the rapid development of image acquisition and storage, multiple images per class are commonly available for computer vision tasks (e.g., face recognition, object detection, medical imaging, etc.). Recently, the recurrent neural network (RNN) has been widely integrated with convolutional neural networks (CNN) to perform image classification on ordered (sequential) data. In this paper, by permutating multiple images as multiple dummy orders, we generalize the ordered "RNN+CNN" design (longitudinal) to a novel unordered fashion, called Multi-path x-D Recurrent Neural Network (MxDRNN) for image classification. To the best of our knowledge, few (if any) existing studies have deployed the RNN framework to unordered intra-class images to leverage classification performance. Specifically, multiple learning paths are introduced in the MxDRNN to extract discriminative features by permutating input dummy orders. Eight datasets from five different fields (MNIST, 3D-MNIST, CIFAR, VGGFace2, and lung screening computed tomography) are included to evaluate the performance of our method. The proposed MxDRNN improves the baseline performance by a large margin across the different application fields (e.g., accuracy from 46.40% to 76.54% in VGGFace2 test pose set, AUC from 0.7418 to 0.8162 in NLST lung dataset). Additionally, empirical experiments show the MxDRNN is more robust to category-irrelevant attributes (e.g., expression, pose in face images), which may introduce difficulties for image classification and algorithm generalizability. The code is publicly available.
随着图像采集和存储的快速发展,对于计算机视觉任务(如人脸识别、目标检测、医学成像等),每个类别通常都有多个图像可用。最近,循环神经网络(RNN)已广泛与卷积神经网络(CNN)集成,以对有序(序列)数据进行图像分类。在本文中,通过将多个图像排列为多个虚拟顺序,我们将有序的“RNN+CNN”设计(纵向)推广为一种新颖的无序方式,称为用于图像分类的多路径x-D循环神经网络(MxDRNN)。据我们所知,很少(如果有的话)现有研究将RNN框架应用于无序的类内图像以提升分类性能。具体而言,MxDRNN中引入了多条学习路径,通过排列输入的虚拟顺序来提取判别性特征。我们纳入了来自五个不同领域的八个数据集(MNIST、3D-MNIST、CIFAR、VGGFace2和肺部筛查计算机断层扫描)来评估我们方法的性能。所提出的MxDRNN在不同应用领域中大幅提高了基线性能(例如,在VGGFace2测试姿态集中准确率从46.40%提高到76.54%,在NLST肺部数据集中AUC从0.7418提高到0.8162)。此外,实证实验表明MxDRNN对与类别无关的属性(如面部图像中的表情、姿态)更具鲁棒性,这些属性可能给图像分类和算法通用性带来困难。代码已公开可用。