Tang Tong, Li Ling, Wu Xiaoyu, Chen Ruizhi, Li Haochen, Lu Guo, Cheng Limin
IEEE Trans Image Process. 2022;31:2463-2477. doi: 10.1109/TIP.2022.3152003. Epub 2022 Mar 18.
Due to the rapid growth of web conferences, remote screen sharing, and online games, screen content has become an important type of internet media information and over 90% of online media interactions are screen based. Meanwhile, as the main component in the screen content, textual information averagely takes up over 40% of the whole image on various commonly used screen content datasets. However, it is difficult to compress the textual information by using the traditional coding schemes as HEVC, which assumes strong spatial and temporal correlations within the image/video. State-of-the-art screen content coding (SCC) standard as HEVC-SCC still adopts a block-based coding framework and does not consider the text semantics for compression, thus inevitably blurring texts at a lower bitrate. In this paper, we propose a general text semantic-aware screen content coding scheme (TSA-SCC) for ultra low bitrate setting. This method detects the abrupt picture in a screen content video (or image), recognizes textual information (including word, position, font type, font size and font color) in the abrupt picture based on neural networks, and encodes texts with text coding tools. The other pictures as well as the background image after removing texts from the abrupt picture via inpainting, are encoded with HEVC-SCC. Compared with HEVC-SCC, the proposed method TSA-SCC reduces bitrate by up to 3× at a similar compression quality. Moreover, TSA-SCC achieves much better visual quality with less bitrate consumption when encoding the screen content video/image at ultra low bitrates.
由于网络会议、远程屏幕共享和在线游戏的迅速发展,屏幕内容已成为互联网媒体信息的一种重要类型,超过90%的在线媒体交互都是基于屏幕的。同时,作为屏幕内容的主要组成部分,文本信息在各种常用屏幕内容数据集上平均占整个图像的40%以上。然而,使用传统编码方案(如HEVC,其假定图像/视频内具有强空间和时间相关性)来压缩文本信息是困难的。作为HEVC - SCC的最新屏幕内容编码(SCC)标准仍然采用基于块的编码框架,并且在压缩时不考虑文本语义,因此在较低比特率下不可避免地会使文本模糊。在本文中,我们提出了一种用于超低比特率设置的通用文本语义感知屏幕内容编码方案(TSA - SCC)。该方法检测屏幕内容视频(或图像)中的突变画面,基于神经网络识别突变画面中的文本信息(包括单词、位置、字体类型、字体大小和字体颜色),并使用文本编码工具对文本进行编码。通过修复从突变画面中去除文本后的其他画面以及背景图像,则使用HEVC - SCC进行编码。与HEVC - SCC相比,所提出的TSA - SCC方法在相似压缩质量下将比特率降低了高达3倍。此外,当以超低比特率对屏幕内容视频/图像进行编码时,TSA - SCC以更少的比特率消耗实现了更好的视觉质量。