用于资源受限和低比特率应用的基于感知的H.264/AVC视频编码

Kau Lih-Jen, Tseng Chin-Kun, Lee Ming-Xian

Department of Electronic Engineering, National Taipei University of Technology, Taipei 106344, Taiwan.

Tri-Service General Hospital Songshan Branch, Taipei 105309, Taiwan.

Sensors (Basel). 2025 Jul 8;25(14):4259. doi: 10.3390/s25144259.

With the rapid expansion of Internet of Things (IoT) and edge computing applications, efficient video transmission under constrained bandwidth and limited computational resources has become increasingly critical. In such environments, perception-based video coding plays a vital role in maintaining acceptable visual quality while minimizing bit rate and processing overhead. Although newer video coding standards have emerged, H.264/AVC remains the dominant compression format in many deployed systems, particularly in commercial CCTV surveillance, due to its compatibility, stability, and widespread hardware support. Motivated by these practical demands, this paper proposes a perception-based video coding algorithm specifically tailored for low-bit-rate H.264/AVC applications. By targeting regions most relevant to the human visual system, the proposed method enhances perceptual quality while optimizing resource usage, making it particularly suitable for embedded systems and bandwidth-limited communication channels. In general, regions containing human faces and those exhibiting significant motion are of primary importance for human perception and should receive higher bit allocation to preserve visual quality. To this end, macroblocks (MBs) containing human faces are detected using the Viola-Jones algorithm, which leverages AdaBoost for feature selection and a cascade of classifiers for fast and accurate detection. This approach is favored over deep learning-based models due to its low computational complexity and real-time capability, making it ideal for latency- and resource-constrained IoT and edge environments. Motion-intensive macroblocks were identified by comparing their motion intensity against the average motion level of preceding reference frames. Based on these criteria, a dynamic quantization parameter (QP) adjustment strategy was applied to assign finer quantization to perceptually important regions of interest (ROIs) in low-bit-rate scenarios. The experimental results show that the proposed method achieves superior subjective visual quality and objective Peak Signal-to-Noise Ratio (PSNR) compared to the standard JM software and other state-of-the-art algorithms under the same bit rate constraints. Moreover, the approach introduces only a marginal increase in computational complexity, highlighting its efficiency. Overall, the proposed algorithm offers an effective balance between visual quality and computational performance, making it well suited for video transmission in bandwidth-constrained, resource-limited IoT and edge computing environments.

随着物联网（IoT）和边缘计算应用的迅速扩展，在带宽受限和计算资源有限的情况下进行高效视频传输变得越来越关键。在这样的环境中，基于感知的视频编码在保持可接受的视觉质量的同时，最小化比特率和处理开销方面发挥着至关重要的作用。尽管出现了更新的视频编码标准，但由于其兼容性、稳定性和广泛的硬件支持，H.264/AVC在许多已部署的系统中仍然是主导的压缩格式，特别是在商业闭路电视监控中。受这些实际需求的推动，本文提出了一种专门为低比特率H.264/AVC应用量身定制的基于感知的视频编码算法。通过针对与人类视觉系统最相关的区域，该方法在优化资源使用的同时提高了感知质量，使其特别适用于嵌入式系统和带宽受限的通信信道。一般来说，包含人脸的区域和那些呈现显著运动的区域对于人类感知至关重要，应该分配更高的比特以保持视觉质量。为此，使用Viola-Jones算法检测包含人脸的宏块，该算法利用AdaBoost进行特征选择，并使用级联分类器进行快速准确的检测。由于其低计算复杂度和实时能力，这种方法比基于深度学习的模型更受青睐，使其成为延迟和资源受限的物联网和边缘环境的理想选择。通过将运动强度与先前参考帧的平均运动水平进行比较来识别运动密集的宏块。基于这些标准，应用了动态量化参数（QP）调整策略，以便在低比特率场景中为感知上重要的感兴趣区域（ROI）分配更精细的量化。实验结果表明，与标准JM软件和其他现有算法相比，在相同比特率约束下，所提出的方法实现了卓越的主观视觉质量和客观峰值信噪比（PSNR）。此外，该方法仅略微增加了计算复杂度，突出了其效率。总体而言，所提出的算法在视觉质量和计算性能之间实现了有效的平衡，使其非常适合在带宽受限、资源有限的物联网和边缘计算环境中进行视频传输。