GPU 上 3D 卷积神经网络的高性能实现。

High Performance Implementation of 3D Convolutional Neural Networks on a GPU.

机构信息

College of Computer, National University of Defense Technology, Changsha 410073, China.

National Key Laboratory of Parallel and Distributed Processing, Changsha 410073, China.

出版信息

Comput Intell Neurosci. 2017;2017:8348671. doi: 10.1155/2017/8348671. Epub 2017 Nov 8.

DOI:10.1155/2017/8348671

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5698830/

Abstract

Convolutional neural networks have proven to be highly successful in applications such as image classification, object tracking, and many other tasks based on 2D inputs. Recently, researchers have started to apply convolutional neural networks to video classification, which constitutes a 3D input and requires far larger amounts of memory and much more computation. FFT based methods can reduce the amount of computation, but this generally comes at the cost of an increased memory requirement. On the other hand, the Winograd Minimal Filtering Algorithm (WMFA) can reduce the number of operations required and thus can speed up the computation, without increasing the required memory. This strategy was shown to be successful for 2D neural networks. We implement the algorithm for 3D convolutional neural networks and apply it to a popular 3D convolutional neural network which is used to classify videos and compare it to cuDNN. For our highly optimized implementation of the algorithm, we observe a twofold speedup for most of the 3D convolution layers of our test network compared to the cuDNN version.

摘要

卷积神经网络在图像分类、目标跟踪等基于二维输入的许多任务中已经被证明非常成功。最近，研究人员开始将卷积神经网络应用于视频分类，这构成了三维输入，需要大量的内存和更多的计算。基于 FFT 的方法可以减少计算量，但这通常是以增加内存需求为代价的。另一方面，Winograd Minimal Filtering Algorithm (WMFA) 可以减少所需的操作数量，从而加速计算，而不增加所需的内存。这种策略在二维神经网络中被证明是成功的。我们为三维卷积神经网络实现了该算法，并将其应用于一种流行的用于视频分类的三维卷积神经网络，将其与 cuDNN 进行比较。对于我们对算法的高度优化实现，与 cuDNN 版本相比，我们测试网络的大多数三维卷积层的速度都提高了两倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9aa3/5698830/7b5320db7814/CIN2017-8348671.001.jpg