College of Information Engineering, Henan Vocational College of Agricuture, Zhengzhou, Henan 451450, China.
Comput Intell Neurosci. 2022 Jun 3;2022:5156532. doi: 10.1155/2022/5156532. eCollection 2022.
In this paper, we conduct an in-depth study and analysis of the automatic image processing algorithm based on a multimodal Recurrent Neural Network (m-RNN) for light environment optimization. By analyzing the structure of m-RNN and combining the current research frontiers of image processing and natural language processing, we find out the problem of the ineffectiveness of m-RNN for some image generation descriptions, starting from both the image feature extraction part and text sequence data processing. Unlike traditional image automatic processing algorithms, this algorithm does not need to add complex rules manually. Still, it evaluates and filters through the training image collection and finally generates image automatic processing models by m-RNN. An image semantic segmentation algorithm is proposed based on multimodal attention and adaptive feature fusion. The main idea of the algorithm is to combine adaptive and feature fusion and then introduce data enhancement for small-scale multimodal light environment datasets by extracting the importance between images through multimodal attention. The model proposed in this paper can span the semantic differences of different modalities and construct feature relationships between different modalities to achieve an inferable, interpretable, and scalable feature representation of multimodal data. The automatic processing of light environment images using multimodal neural networks based on traditional algorithms eliminates manual processing and greatly reduces the time and effort of image processing.
在本文中,我们深入研究和分析了基于多模态递归神经网络(m-RNN)的自动图像处理算法,用于优化光环境。通过分析 m-RNN 的结构,并结合图像处理和自然语言处理的当前研究前沿,我们发现 m-RNN 对于某些图像生成描述的效果不佳,这一问题源于图像特征提取部分和文本序列数据处理两方面。与传统的图像自动处理算法不同,该算法不需要手动添加复杂的规则,而是通过训练图像集进行评估和过滤,最终通过 m-RNN 生成图像自动处理模型。提出了一种基于多模态注意力和自适应特征融合的图像语义分割算法。该算法的主要思想是通过多模态注意力提取图像之间的重要性,结合自适应和特征融合,然后对小规模多模态光环境数据集进行数据增强。本文提出的模型可以跨越不同模态的语义差异,构建不同模态之间的特征关系,实现多模态数据可推断、可解释和可扩展的特征表示。基于传统算法的多模态神经网络对光环境图像的自动处理消除了人工处理,大大减少了图像处理的时间和精力。