Alam M, Samad M D, Vidyaratne L, Glandon A, Iftekharuddin K M
Department of Computer Science, Tennessee State University, Nashville, TN, 37209.
Neurocomputing (Amst). 2020 Dec 5;417:302-321. doi: 10.1016/j.neucom.2020.07.053. Epub 2020 Jul 26.
This survey presents a review of state-of-the-art deep neural network architectures, algorithms, and systems in vision and speech applications. Recent advances in deep artificial neural network algorithms and architectures have spurred rapid innovation and development of intelligent vision and speech systems. With availability of vast amounts of sensor data and cloud computing for processing and training of deep neural networks, and with increased sophistication in mobile and embedded technology, the next-generation intelligent systems are poised to revolutionize personal and commercial computing. This survey begins by providing background and evolution of some of the most successful deep learning models for intelligent vision and speech systems to date. An overview of large-scale industrial research and development efforts is provided to emphasize future trends and prospects of intelligent vision and speech systems. Robust and efficient intelligent systems demand low-latency and high fidelity in resource-constrained hardware platforms such as mobile devices, robots, and automobiles. Therefore, this survey also provides a summary of key challenges and recent successes in running deep neural networks on hardware-restricted platforms, i.e. within limited memory, battery life, and processing capabilities. Finally, emerging applications of vision and speech across disciplines such as affective computing, intelligent transportation, and precision medicine are discussed. To our knowledge, this paper provides one of the most comprehensive surveys on the latest developments in intelligent vision and speech applications from the perspectives of both software and hardware systems. Many of these emerging technologies using deep neural networks show tremendous promise to revolutionize research and development for future vision and speech systems.
本次调查对视觉和语音应用中最先进的深度神经网络架构、算法和系统进行了综述。深度人工神经网络算法和架构的最新进展推动了智能视觉和语音系统的快速创新与发展。随着大量传感器数据的可用性以及用于深度神经网络处理和训练的云计算,再加上移动和嵌入式技术日益成熟,下一代智能系统有望彻底改变个人和商业计算。本次调查首先介绍了迄今为止一些用于智能视觉和语音系统的最成功深度学习模型的背景和发展历程。概述了大规模工业研发工作,以强调智能视觉和语音系统的未来趋势和前景。强大且高效的智能系统在诸如移动设备、机器人和汽车等资源受限的硬件平台上需要低延迟和高保真度。因此,本次调查还总结了在硬件受限平台(即内存有限、电池续航和处理能力有限的情况下)运行深度神经网络的关键挑战和近期取得的成功。最后,讨论了视觉和语音在情感计算、智能交通和精准医学等跨学科领域的新兴应用。据我们所知,本文从软件和硬件系统的角度对智能视觉和语音应用的最新发展进行了最全面的调查之一。许多这些使用深度神经网络的新兴技术显示出巨大的潜力,有望彻底改变未来视觉和语音系统的研发。