Makarenko Maksim, Burguete-Lopez Arturo, Wang Qizhou, Giancola Silvio, Ghanem Bernard, Passone Luca, Fratalocchi Andrea
PRIMALIGHT, Faculty of Electrical Engineering; Applied Mathematics and Computational Science, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.
AI & Advanced Computing Lab, EXPEC ARC, Saudi Aramco, 4143 Dhahran Blvd, Gharb Al Dhahran, Dhahran, 34466, Saudi Arabia.
Nat Commun. 2024 Aug 15;15(1):7051. doi: 10.1038/s41467-024-51406-6.
Recent advancements in artificial intelligence have significantly expanded capabilities in processing language and images. However, the challenge of comprehensively understanding video content still needs to be solved. The main problem is the requirement to process real-time multidimensional video information at data rates exceeding 1 Tb/s, a demand that current hardware technologies cannot meet. This work introduces a hardware-accelerated integrated optoelectronic platform specifically designed for the real-time analysis of multidimensional video. By leveraging optical information processing within artificial intelligence hardware and combining it with advanced machine vision networks, the platform achieves data processing speeds of 1.2 Tb/s. This capability supports the analysis of hundreds of frequency bands with megapixel spatial resolution at video frame rates, significantly outperforming existing technologies in speed by three to four orders of magnitude. The platform demonstrates effectiveness for AI-driven tasks, such as video semantic segmentation and object understanding, across indoor and aerial scenarios. By overcoming the current data processing speed limitations, the platform shows promise in real-time AI video understanding, with potential implications for enhancing human-machine interactions and advancing cognitive processing technologies.
人工智能领域的最新进展显著扩展了语言和图像处理能力。然而,全面理解视频内容的挑战仍有待解决。主要问题在于需要以超过1 Tb/s的数据速率处理实时多维视频信息,这是当前硬件技术无法满足的需求。这项工作介绍了一种专门为多维视频实时分析设计的硬件加速集成光电平台。通过在人工智能硬件中利用光学信息处理,并将其与先进的机器视觉网络相结合,该平台实现了1.2 Tb/s的数据处理速度。这种能力支持在视频帧率下以百万像素空间分辨率分析数百个频段,在速度上比现有技术显著高出三到四个数量级。该平台在室内和空中场景中展示了对人工智能驱动任务(如视频语义分割和对象理解)的有效性。通过克服当前的数据处理速度限制,该平台在实时人工智能视频理解方面展现出潜力,对增强人机交互和推进认知处理技术具有潜在影响。