Akhtar Anique, Li Zhu, Van der Auwera Geert
IEEE Trans Image Process. 2024;33:584-594. doi: 10.1109/TIP.2023.3343096. Epub 2024 Jan 8.
Efficient point cloud compression is essential for applications like virtual and mixed reality, autonomous driving, and cultural heritage. This paper proposes a deep learning-based inter-frame encoding scheme for dynamic point cloud geometry compression. We propose a lossy geometry compression scheme that predicts the latent representation of the current frame using the previous frame by employing a novel feature space inter-prediction network. The proposed network utilizes sparse convolutions with hierarchical multiscale 3D feature learning to encode the current frame using the previous frame. The proposed method introduces a novel predictor network for motion compensation in the feature domain to map the latent representation of the previous frame to the coordinates of the current frame to predict the current frame's feature embedding. The framework transmits the residual of the predicted features and the actual features by compressing them using a learned probabilistic factorized entropy model. At the receiver, the decoder hierarchically reconstructs the current frame by progressively rescaling the feature embedding. The proposed framework is compared to the state-of-the-art Video-based Point Cloud Compression (V-PCC) and Geometry-based Point Cloud Compression (G-PCC) schemes standardized by the Moving Picture Experts Group (MPEG). The proposed method achieves more than 88% BD-Rate (Bjøntegaard Delta Rate) reduction against G-PCCv20 Octree, more than 56% BD-Rate savings against G-PCCv20 Trisoup, more than 62% BD-Rate reduction against V-PCC intra-frame encoding mode, and more than 52% BD-Rate savings against V-PCC P-frame-based inter-frame encoding mode using HEVC. These significant performance gains are cross-checked and verified in the MPEG working group.
高效的点云压缩对于虚拟现实、混合现实、自动驾驶和文化遗产等应用至关重要。本文提出了一种基于深度学习的帧间编码方案,用于动态点云几何压缩。我们提出了一种有损几何压缩方案,该方案通过采用新颖的特征空间帧间预测网络,利用前一帧预测当前帧的潜在表示。所提出的网络利用具有分层多尺度3D特征学习的稀疏卷积,以前一帧编码当前帧。所提出的方法引入了一种新颖的预测器网络,用于在特征域中进行运动补偿,将前一帧的潜在表示映射到当前帧的坐标,以预测当前帧的特征嵌入。该框架通过使用学习到的概率因式分解熵模型对预测特征和实际特征的残差进行压缩来传输它们。在接收器端,解码器通过逐步重新缩放特征嵌入来分层重建当前帧。将所提出的框架与由动态图像专家组(MPEG)标准化的基于视频的点云压缩(V-PCC)和基于几何的点云压缩(G-PCC)方案进行了比较。所提出的方法相对于G-PCCv20八叉树实现了超过88%的BD-Rate(Bjøntegaard Delta Rate)降低,相对于G-PCCv20 Trisoup节省了超过56%的BD-Rate,相对于V-PCC帧内编码模式降低了超过62%的BD-Rate,相对于使用HEVC的V-PCC基于P帧的帧间编码模式节省了超过52%的BD-Rate。这些显著的性能提升在MPEG工作组中得到了交叉检查和验证。