IEEE Trans Image Process. 2024;33:5740-5754. doi: 10.1109/TIP.2024.3468881. Epub 2024 Oct 9.
We study the visual quality judgments of human subjects on digital human avatars (sometimes referred to as "holograms" in the parlance of virtual reality [VR] and augmented reality [AR] systems) that have been subjected to distortions. We also study the ability of video quality models to predict human judgments. As streaming human avatar videos in VR or AR become increasingly common, the need for more advanced human avatar video compression protocols will be required to address the tradeoffs between faithfully transmitting high-quality visual representations while adjusting to changeable bandwidth scenarios. During transmission over the internet, the perceived quality of compressed human avatar videos can be severely impaired by visual artifacts. To optimize trade-offs between perceptual quality and data volume in practical workflows, video quality assessment (VQA) models are essential tools. However, there are very few VQA algorithms developed specifically to analyze human body avatar videos, due, at least in part, to the dearth of appropriate and comprehensive datasets of adequate size. Towards filling this gap, we introduce the LIVE-Meta Rendered Human Avatar VQA Database, which contains 720 human avatar videos processed using 20 different combinations of encoding parameters, labeled by corresponding human perceptual quality judgments that were collected in six degrees of freedom VR headsets. To demonstrate the usefulness of this new and unique video resource, we use it to study and compare the performances of a variety of state-of-the-art Full Reference and No Reference video quality prediction models, including a new model called HoloQA. As a service to the research community, we publicly releases the metadata of the new database at https://live.ece.utexas.edu/research/LIVE-Meta-rendered-human-avatar/index.html.
我们研究了人类受试者对经过扭曲的数字人体虚拟形象(在虚拟现实[VR]和增强现实[AR]系统的术语中有时称为“全息图”)的视觉质量判断。我们还研究了视频质量模型预测人类判断的能力。随着在 VR 或 AR 中流式传输人体虚拟形象视频变得越来越普遍,需要更先进的人体虚拟形象视频压缩协议,以在忠实传输高质量视觉表示与适应可变带宽场景之间进行权衡。在通过互联网传输期间,经过压缩的人体虚拟形象视频的感知质量可能会因视觉伪影而严重受损。为了在实际工作流程中优化感知质量和数据量之间的权衡,视频质量评估(VQA)模型是必不可少的工具。但是,由于缺乏适当且全面的足够大小的数据集,因此专门用于分析人体虚拟形象视频的 VQA 算法非常少。为了填补这一空白,我们引入了 LIVE-Meta Rendered Human Avatar VQA 数据库,其中包含 720 个使用 20 种不同编码参数组合处理的人体虚拟形象视频,由在六自由度 VR 头显中收集的相应人类感知质量判断进行标记。为了展示这个新的和独特的视频资源的有用性,我们使用它来研究和比较各种最先进的全参考和无参考视频质量预测模型的性能,包括一个名为 HoloQA 的新模型。作为对研究社区的服务,我们在 https://live.ece.utexas.edu/research/LIVE-Meta-rendered-human-avatar/index.html 上公开发布了新数据库的元数据。