Suppr超能文献

语音与视觉系统中的深度神经网络调查

Survey on Deep Neural Networks in Speech and Vision Systems.

作者信息

Alam M, Samad M D, Vidyaratne L, Glandon A, Iftekharuddin K M

机构信息

Department of Computer Science, Tennessee State University, Nashville, TN, 37209.

出版信息

Neurocomputing (Amst). 2020 Dec 5;417:302-321. doi: 10.1016/j.neucom.2020.07.053. Epub 2020 Jul 26.

Abstract

This survey presents a review of state-of-the-art deep neural network architectures, algorithms, and systems in vision and speech applications. Recent advances in deep artificial neural network algorithms and architectures have spurred rapid innovation and development of intelligent vision and speech systems. With availability of vast amounts of sensor data and cloud computing for processing and training of deep neural networks, and with increased sophistication in mobile and embedded technology, the next-generation intelligent systems are poised to revolutionize personal and commercial computing. This survey begins by providing background and evolution of some of the most successful deep learning models for intelligent vision and speech systems to date. An overview of large-scale industrial research and development efforts is provided to emphasize future trends and prospects of intelligent vision and speech systems. Robust and efficient intelligent systems demand low-latency and high fidelity in resource-constrained hardware platforms such as mobile devices, robots, and automobiles. Therefore, this survey also provides a summary of key challenges and recent successes in running deep neural networks on hardware-restricted platforms, i.e. within limited memory, battery life, and processing capabilities. Finally, emerging applications of vision and speech across disciplines such as affective computing, intelligent transportation, and precision medicine are discussed. To our knowledge, this paper provides one of the most comprehensive surveys on the latest developments in intelligent vision and speech applications from the perspectives of both software and hardware systems. Many of these emerging technologies using deep neural networks show tremendous promise to revolutionize research and development for future vision and speech systems.

摘要

本次调查对视觉和语音应用中最先进的深度神经网络架构、算法和系统进行了综述。深度人工神经网络算法和架构的最新进展推动了智能视觉和语音系统的快速创新与发展。随着大量传感器数据的可用性以及用于深度神经网络处理和训练的云计算,再加上移动和嵌入式技术日益成熟,下一代智能系统有望彻底改变个人和商业计算。本次调查首先介绍了迄今为止一些用于智能视觉和语音系统的最成功深度学习模型的背景和发展历程。概述了大规模工业研发工作,以强调智能视觉和语音系统的未来趋势和前景。强大且高效的智能系统在诸如移动设备、机器人和汽车等资源受限的硬件平台上需要低延迟和高保真度。因此,本次调查还总结了在硬件受限平台(即内存有限、电池续航和处理能力有限的情况下)运行深度神经网络的关键挑战和近期取得的成功。最后,讨论了视觉和语音在情感计算、智能交通和精准医学等跨学科领域的新兴应用。据我们所知,本文从软件和硬件系统的角度对智能视觉和语音应用的最新发展进行了最全面的调查之一。许多这些使用深度神经网络的新兴技术显示出巨大的潜力,有望彻底改变未来视觉和语音系统的研发。

相似文献

1
Survey on Deep Neural Networks in Speech and Vision Systems.语音与视觉系统中的深度神经网络调查
Neurocomputing (Amst). 2020 Dec 5;417:302-321. doi: 10.1016/j.neucom.2020.07.053. Epub 2020 Jul 26.

引用本文的文献

3
Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.电子健康记录表格数据中基于注意力机制的缺失值插补
Proc (IEEE Int Conf Healthc Inform). 2024 Jun;2024:177-182. doi: 10.1109/ichi61247.2024.00030. Epub 2024 Aug 22.
4
Deep Clustering of Electronic Health Records Tabular Data for Clinical Interpretation.用于临床解读的电子健康记录表格数据深度聚类
IEEE Int Conf Telecommun Photonics. 2023 Dec;2023. doi: 10.1109/ictp60248.2023.10490723. Epub 2024 Apr 11.

本文引用的文献

1
ArcFace: Additive Angular Margin Loss for Deep Face Recognition.ArcFace:用于深度人脸识别的附加角度间隔损失。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):5962-5979. doi: 10.1109/TPAMI.2021.3087709. Epub 2022 Sep 14.
4
Deep Audio-Visual Speech Recognition.深度视听语音识别
IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):8717-8727. doi: 10.1109/TPAMI.2018.2889052. Epub 2022 Nov 7.
8
Real-Time 3D Hand Pose Estimation with 3D Convolutional Neural Networks.基于3D卷积神经网络的实时3D手部姿态估计
IEEE Trans Pattern Anal Mach Intell. 2019 Apr;41(4):956-970. doi: 10.1109/TPAMI.2018.2827052. Epub 2018 Apr 16.
9
Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates.基于时空长短期记忆网络及信任门控的骨骼动作识别
IEEE Trans Pattern Anal Mach Intell. 2018 Dec;40(12):3007-3021. doi: 10.1109/TPAMI.2017.2771306. Epub 2017 Nov 9.
10
Deep learning applications in ophthalmology.深度学习在眼科中的应用。
Curr Opin Ophthalmol. 2018 May;29(3):254-260. doi: 10.1097/ICU.0000000000000470.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验