Guangzhou Sport University, Guangzhou, Guangdong 510500, China.
Guangdong Baiyun University, Guangzhou, Guangdong 510450, China.
Comput Intell Neurosci. 2021 Oct 14;2021:7088837. doi: 10.1155/2021/7088837. eCollection 2021.
With the development of computer technology, video description, which combines the key technologies in the field of natural language processing and computer vision, has attracted more and more researchers' attention. Among them, how to objectively and efficiently describe high-speed and detailed sports videos is the key to the development of the video description field. In view of the problems of sentence errors and loss of visual information in the generation of the video description text due to the lack of language learning information in the existing video description methods, a multihead model combining the long-term and short-term memory network and attention mechanism is proposed for the intelligent description of the volleyball video. Through the introduction of the attention mechanism, the model pays much attention to the significant areas in the video when generating sentences. Through the comparative experiment with different models, the results show that the model with the attention mechanism can effectively solve the loss of visual information. Compared with the LSTM and base model, the multihead model proposed in this paper, which combines the long-term and short-term memory network and attention mechanism, has higher scores in all evaluation indexes and significantly improved the quality of the intelligent text description of the volleyball video.
随着计算机技术的发展,视频描述作为自然语言处理和计算机视觉领域的关键技术结合体,越来越受到研究人员的关注。其中,如何客观、高效地描述高速、细节丰富的体育视频是视频描述领域发展的关键。针对现有视频描述方法中缺乏语言学习信息导致视频描述文本生成中存在句子错误和视觉信息丢失的问题,针对排球视频的智能描述,提出了一种结合长短时记忆网络和注意力机制的多头模型。通过引入注意力机制,该模型在生成句子时会更加关注视频中的显著区域。通过与不同模型的对比实验,结果表明,带有注意力机制的模型可以有效地解决视觉信息丢失的问题。与 LSTM 和基础模型相比,本文提出的结合长短时记忆网络和注意力机制的多头模型在所有评价指标上的得分都更高,显著提高了排球视频智能文本描述的质量。