Li Yinghua, Dang Xueqi, Ma Lei, Klein Jacques, Bissyandé Tegawendé F
SnT Centre, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
The University of Tokyo, Tokyo, Japan.
Empir Softw Eng. 2024;29(5):111. doi: 10.1007/s10664-024-10520-1. Epub 2024 Jul 22.
The widespread adoption of video-based applications across various fields highlights their importance in modern software systems. However, in comparison to images or text, labelling video test cases for the purpose of assessing system accuracy can lead to increased expenses due to their temporal structure and larger volume. Test prioritization has emerged as a promising approach to mitigate the labeling cost, which prioritizes potentially misclassified test inputs so that such inputs can be identified earlier with limited time and manual labeling efforts. However, applying existing prioritization techniques to video test cases faces certain limitations: they do not account for the unique temporal information present in video data. Unlike static image datasets that only contain spatial information, video inputs consist of multiple frames that capture the dynamic changes of objects over time. In this paper, we propose VRank, the first test prioritization approach designed specifically for video test inputs. The fundamental idea behind VRank is that video-type tests with a higher probability of being misclassified by the evaluated DNN classifier are considered more likely to reveal faults and will be prioritized higher. To this end, we train a ranking model with the aim of predicting the probability of a given test input being misclassified by a DNN classifier. This prediction relies on four types of generated features: temporal features (TF), video embedding features (EF), prediction features (PF), and uncertainty features (UF). We rank all test inputs in the target test set based on their misclassification probabilities. Videos with a higher likelihood of being misclassified will be prioritized higher. We conducted an empirical evaluation to assess the performance of VRank, involving 120 subjects with both natural and noisy datasets. The experimental results reveal VRank outperforms all compared test prioritization methods, with an average improvement of 5.76% 46.51% on natural datasets and 4.26% 53.56% on noisy datasets.
基于视频的应用在各个领域的广泛采用凸显了它们在现代软件系统中的重要性。然而,与图像或文本相比,为评估系统准确性而标记视频测试用例因其时间结构和更大的体量可能会导致成本增加。测试优先级排序已成为一种有前景的方法来减轻标记成本,它对潜在误分类的测试输入进行优先级排序,以便在有限的时间和人工标记工作下能更早地识别出此类输入。然而,将现有的优先级排序技术应用于视频测试用例存在一定局限性:它们没有考虑视频数据中存在的独特时间信息。与仅包含空间信息的静态图像数据集不同,视频输入由多个帧组成,这些帧捕捉了对象随时间的动态变化。在本文中,我们提出了VRank,这是第一种专门为视频测试输入设计的测试优先级排序方法。VRank背后的基本思想是,被评估的深度神经网络(DNN)分类器误分类概率较高的视频类型测试更有可能揭示故障,因此将被赋予更高的优先级。为此,我们训练了一个排序模型,目的是预测给定测试输入被DNN分类器误分类的概率。这种预测依赖于四种生成的特征:时间特征(TF)、视频嵌入特征(EF)、预测特征(PF)和不确定性特征(UF)。我们根据所有测试输入的误分类概率对目标测试集中的所有测试输入进行排序。被误分类可能性更高的视频将被赋予更高的优先级。我们进行了实证评估以评估VRank的性能,涉及120个自然数据集和噪声数据集的受试者。实验结果表明,VRank优于所有比较的测试优先级排序方法,在自然数据集上平均提高了5.76%至46.51%,在噪声数据集上提高了4.26%至53.56%。