文献检索，用中文搜 PubMed

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Zhao Hong, Chen Zhiwen, Guo Lan, Han Zeyu

School of Computer and Communication, Lanzhou University of Technology, Lanzhou, Gansu, China.

Network & Information Center, Lanzhou University of Technology, Lanzhou, Gansu, China.

PeerJ Comput Sci. 2022 Mar 16;8:e916. doi: 10.7717/peerj-cs.916. eCollection 2022.

Global encoding of visual features in video captioning is important for improving the description accuracy. In this paper, we propose a video captioning method that combines Vision Transformer (ViT) and reinforcement learning. Firstly, Resnet-152 and ResNeXt-101 are used to extract features from videos. Secondly, the encoding block of the ViT network is applied to encode video features. Thirdly, the encoded features are fed into a Long Short-Term Memory (LSTM) network to generate a video content description. Finally, the accuracy of video content description is further improved by fine-tuning reinforcement learning. We conducted experiments on the benchmark dataset MSR-VTT used for video captioning. The results show that compared with the current mainstream methods, the model in this paper has improved by 2.9%, 1.4%, 0.9% and 4.8% under the four evaluation indicators of LEU-4, METEOR, ROUGE-L and CIDEr-D, respectively.

视觉特征在视频字幕中的全局编码对于提高描述准确性很重要。在本文中，我们提出了一种结合视觉Transformer（ViT）和强化学习的视频字幕方法。首先，使用Resnet-152和ResNeXt-101从视频中提取特征。其次，应用ViT网络的编码块对视频特征进行编码。第三，将编码后的特征输入到长短期记忆（LSTM）网络中以生成视频内容描述。最后，通过微调强化学习进一步提高视频内容描述的准确性。我们在用于视频字幕的基准数据集MSR-VTT上进行了实验。结果表明，与当前主流方法相比，本文中的模型在LEU-4、METEOR、ROUGE-L和CIDEr-D这四个评估指标下分别提高了2.9%、1.4%、0.9%和4.8%。

相似文献

Video captioning based on vision transformer and reinforcement learning.基于视觉Transformer和强化学习的视频字幕

PeerJ Comput Sci. 2022 Mar 16;8:e916. doi: 10.7717/peerj-cs.916. eCollection 2022.

UAT: Universal Attention Transformer for Video Captioning.UAT：用于视频字幕的通用注意力转换器。

Sensors (Basel). 2022 Jun 25;22(13):4817. doi: 10.3390/s22134817.

Video Captioning Using Global-Local Representation.使用全局-局部表示的视频字幕

IEEE Trans Circuits Syst Video Technol. 2022 Oct;32(10):6642-6656. doi: 10.1109/tcsvt.2022.3177320. Epub 2022 May 23.

Research on Video Captioning Based on Multifeature Fusion.基于多特征融合的视频字幕研究。

Comput Intell Neurosci. 2022 Apr 28;2022:1204909. doi: 10.1155/2022/1204909. eCollection 2022.

Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning.通过强化学习重构和表示视频内容用于字幕生成

IEEE Trans Pattern Anal Mach Intell. 2020 Dec;42(12):3088-3101. doi: 10.1109/TPAMI.2019.2920899. Epub 2020 Nov 3.

Semantic guidance network for video captioning.用于视频字幕的语义引导网络。

Sci Rep. 2023 Sep 26;13(1):16076. doi: 10.1038/s41598-023-43010-3.

Video Captioning with Object-Aware Spatio-Temporal Correlation and Aggregation.具有目标感知时空相关性与聚合的视频字幕

IEEE Trans Image Process. 2020 Apr 27. doi: 10.1109/TIP.2020.2988435.

Long Short-Term Relation Transformer With Global Gating for Video Captioning.用于视频字幕的带全局门控的长短时关系变换器

IEEE Trans Image Process. 2022;31:2726-2738. doi: 10.1109/TIP.2022.3158546. Epub 2022 Mar 29.

SibNet: Sibling Convolutional Encoder for Video Captioning.SibNet：用于视频字幕的兄弟卷积编码器

IEEE Trans Pattern Anal Mach Intell. 2021 Sep;43(9):3259-3272. doi: 10.1109/TPAMI.2019.2940007. Epub 2021 Aug 4.

Cross Encoder-Decoder Transformer with Global-Local Visual Extractor for Medical Image Captioning.交叉编解码器-解码器转换器与全局-局部视觉提取器用于医学图像字幕。

Sensors (Basel). 2022 Feb 13;22(4):1429. doi: 10.3390/s22041429.

本文引用的文献

Video Captioning with Object-Aware Spatio-Temporal Correlation and Aggregation.具有目标感知时空相关性与聚合的视频字幕

IEEE Trans Image Process. 2020 Apr 27. doi: 10.1109/TIP.2020.2988435.

Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning.通过强化学习重构和表示视频内容用于字幕生成

IEEE Trans Pattern Anal Mach Intell. 2020 Dec;42(12):3088-3101. doi: 10.1109/TPAMI.2019.2920899. Epub 2020 Nov 3.

Video Captioning by Adversarial LSTM.基于对抗长短时记忆网络的视频字幕生成

Zhao Hong, Chen Zhiwen, Guo Lan, Han Zeyu

School of Computer and Communication, Lanzhou University of Technology, Lanzhou, Gansu, China.

Network & Information Center, Lanzhou University of Technology, Lanzhou, Gansu, China.

PeerJ Comput Sci. 2022 Mar 16;8:e916. doi: 10.7717/peerj-cs.916. eCollection 2022.

相似文献

Video captioning based on vision transformer and reinforcement learning.基于视觉Transformer和强化学习的视频字幕

PeerJ Comput Sci. 2022 Mar 16;8:e916. doi: 10.7717/peerj-cs.916. eCollection 2022.

UAT: Universal Attention Transformer for Video Captioning.UAT：用于视频字幕的通用注意力转换器。

Sensors (Basel). 2022 Jun 25;22(13):4817. doi: 10.3390/s22134817.

Video Captioning Using Global-Local Representation.使用全局-局部表示的视频字幕

IEEE Trans Circuits Syst Video Technol. 2022 Oct;32(10):6642-6656. doi: 10.1109/tcsvt.2022.3177320. Epub 2022 May 23.

Research on Video Captioning Based on Multifeature Fusion.基于多特征融合的视频字幕研究。

Comput Intell Neurosci. 2022 Apr 28;2022:1204909. doi: 10.1155/2022/1204909. eCollection 2022.

Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning.通过强化学习重构和表示视频内容用于字幕生成

IEEE Trans Pattern Anal Mach Intell. 2020 Dec;42(12):3088-3101. doi: 10.1109/TPAMI.2019.2920899. Epub 2020 Nov 3.

Semantic guidance network for video captioning.用于视频字幕的语义引导网络。

Sci Rep. 2023 Sep 26;13(1):16076. doi: 10.1038/s41598-023-43010-3.

Video Captioning with Object-Aware Spatio-Temporal Correlation and Aggregation.具有目标感知时空相关性与聚合的视频字幕

IEEE Trans Image Process. 2020 Apr 27. doi: 10.1109/TIP.2020.2988435.

Long Short-Term Relation Transformer With Global Gating for Video Captioning.用于视频字幕的带全局门控的长短时关系变换器

IEEE Trans Image Process. 2022;31:2726-2738. doi: 10.1109/TIP.2022.3158546. Epub 2022 Mar 29.

SibNet: Sibling Convolutional Encoder for Video Captioning.SibNet：用于视频字幕的兄弟卷积编码器

IEEE Trans Pattern Anal Mach Intell. 2021 Sep;43(9):3259-3272. doi: 10.1109/TPAMI.2019.2940007. Epub 2021 Aug 4.

Cross Encoder-Decoder Transformer with Global-Local Visual Extractor for Medical Image Captioning.交叉编解码器-解码器转换器与全局-局部视觉提取器用于医学图像字幕。

Sensors (Basel). 2022 Feb 13;22(4):1429. doi: 10.3390/s22041429.

本文引用的文献

Video Captioning with Object-Aware Spatio-Temporal Correlation and Aggregation.具有目标感知时空相关性与聚合的视频字幕

IEEE Trans Image Process. 2020 Apr 27. doi: 10.1109/TIP.2020.2988435.

Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning.通过强化学习重构和表示视频内容用于字幕生成

IEEE Trans Pattern Anal Mach Intell. 2020 Dec;42(12):3088-3101. doi: 10.1109/TPAMI.2019.2920899. Epub 2020 Nov 3.

Video Captioning by Adversarial LSTM.基于对抗长短时记忆网络的视频字幕生成

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于视觉Transformer和强化学习的视频字幕

Video captioning based on vision transformer and reinforcement learning.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

基于视觉Transformer和强化学习的视频字幕

Video captioning based on vision transformer and reinforcement learning.

作者信息

机构信息

出版信息

相似文献

本文引用的文献