Yang Qi, Ma Zhan, Xu Yiling, Li Zhu, Sun Jun
IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):3015-3029. doi: 10.1109/TPAMI.2020.3047083. Epub 2022 May 5.
Objective quality estimation of media content plays a vital role in a wide range of applications. Though numerous metrics exist for 2D images and videos, similar metrics are missing for 3D point clouds with unstructured and non-uniformly distributed points. In this paper, we propose [Formula: see text]-a metric to accurately and quantitatively predict the human perception of point cloud with superimposed geometry and color impairments. Human vision system is more sensitive to the high spatial-frequency components (e.g., contours and edges), and weighs local structural variations more than individual point intensities. Motivated by this fact, we use graph signal gradient as a quality index to evaluate point cloud distortions. Specifically, we first extract geometric keypoints by resampling the reference point cloud geometry information to form an object skeleton. Then, we construct local graphs centered at these keypoints for both reference and distorted point clouds. Next, we compute three moments of color gradients between centered keypoint and all other points in the same local graph for local significance similarity feature. Finally, we obtain similarity index by pooling the local graph significance across all color channels and averaging across all graphs. We evaluate [Formula: see text] on two large and independent point cloud assessment datasets that involve a wide range of impairments (e.g., re-sampling, compression, and additive noise). [Formula: see text] provides state-of-the-art performance for all distortions with noticeable gains in predicting the subjective mean opinion score (MOS) in comparison with point-wise distance-based metrics adopted in standardized reference software. Ablation studies further show that [Formula: see text] can be generalized to various scenarios with consistent performance by adjusting its key modules and parameters. Models and associated materials will be made available at https://njuvision.github.io/GraphSIM or http://smt.sjtu.edu.cn/papers/GraphSIM.
媒体内容的客观质量评估在广泛的应用中起着至关重要的作用。尽管存在许多用于二维图像和视频的指标,但对于具有非结构化和非均匀分布点的三维点云,类似的指标却缺失。在本文中,我们提出了一种度量方法,用于准确且定量地预测人类对叠加了几何和颜色损伤的点云的感知。人类视觉系统对高空间频率分量(例如轮廓和边缘)更为敏感,并且对局部结构变化的权重高于单个点的强度。受这一事实的启发,我们使用图信号梯度作为质量指标来评估点云失真。具体而言,我们首先通过对参考点云几何信息进行重采样来提取几何关键点,以形成物体骨架。然后,我们为参考点云和失真点云构建以这些关键点为中心的局部图。接下来,我们计算中心关键点与同一局部图中所有其他点之间颜色梯度的三个矩,以获得局部显著性相似性特征。最后,我们通过汇总所有颜色通道上的局部图显著性并对所有图进行平均来获得相似性指标。我们在两个大型且独立的点云评估数据集上评估该度量方法,这些数据集涉及广泛的损伤(例如重采样、压缩和加性噪声)。与标准化参考软件中采用的基于逐点距离的度量方法相比,该度量方法在预测主观平均意见得分(MOS)方面有显著提升,为所有失真提供了最先进的性能。消融研究进一步表明,通过调整其关键模块和参数,该度量方法可以推广到各种场景并具有一致的性能。模型和相关材料将在https://njuvision.github.io/GraphSIM或http://smt.sjtu.edu.cn/papers/GraphSIM上提供。