Moreira Daniel, Avila Sandra, Perez Mauricio, Moraes Daniel, Testoni Vanessa, Valle Eduardo, Goldenstein Siome, Rocha Anderson
Institute of Computing, University of Campinas, Brazil.
School of Electrical and Computing Engineering, University of Campinas, Brazil.
Forensic Sci Int. 2016 Nov;268:46-61. doi: 10.1016/j.forsciint.2016.09.010. Epub 2016 Sep 21.
As web technologies and social networks become part of the general public's life, the problem of automatically detecting pornography is into every parent's mind - nobody feels completely safe when their children go online. In this paper, we focus on video-pornography classification, a hard problem in which traditional methods often employ still-image techniques - labeling frames individually prior to a global decision. Frame-based approaches, however, ignore significant cogent information brought by motion. Here, we introduce a space-temporal interest point detector and descriptor called Temporal Robust Features (TRoF). TRoF was custom-tailored for efficient (low processing time and memory footprint) and effective (high classification accuracy and low false negative rate) motion description, particularly suited to the task at hand. We aggregate local information extracted by TRoF into a mid-level representation using Fisher Vectors, the state-of-the-art model of Bags of Visual Words (BoVW). We evaluate our original strategy, contrasting it both to commercial pornography detection solutions, and to BoVW solutions based upon other space-temporal features from the scientific literature. The performance is assessed using the Pornography-2k dataset, a new challenging pornographic benchmark, comprising 2000 web videos and 140h of video footage. The dataset is also a contribution of this work and is very assorted, including both professional and amateur content, and it depicts several genres of pornography, from cartoon to live action, with diverse behavior and ethnicity. The best approach, based on a dense application of TRoF, yields a classification error reduction of almost 79% when compared to the best commercial classifier. A sparse description relying on TRoF detector is also noteworthy, for yielding a classification error reduction of over 69%, with 19× less memory footprint than the dense solution, and yet can also be implemented to meet real-time requirements.
随着网络技术和社交网络融入公众生活,自动检测色情内容的问题成为每位家长心中所想——孩子上网时,没人能完全安心。在本文中,我们聚焦于视频色情内容分类,这是一个难题,传统方法通常采用静态图像技术——在全局决策之前逐个标记帧。然而,基于帧的方法忽略了运动带来的重要有力信息。在此,我们引入一种时空兴趣点检测器和描述符,称为时间鲁棒特征(TRoF)。TRoF是为高效(低处理时间和内存占用)和有效(高分类准确率和低误报率)的运动描述量身定制的,特别适合手头的任务。我们使用视觉词袋(BoVW)的最新模型Fisher向量,将TRoF提取的局部信息聚合为中级表示。我们评估我们的原始策略,将其与商业色情内容检测解决方案以及基于科学文献中其他时空特征的BoVW解决方案进行对比。使用色情内容-2k数据集评估性能,这是一个新的具有挑战性的色情基准,包含2000个网络视频和140小时的视频片段。该数据集也是这项工作的一个贡献,非常多样化,包括专业和业余内容,并且描绘了从卡通到真人表演的几种色情类型,具有不同的行为和种族。与最佳商业分类器相比,基于TRoF密集应用的最佳方法可将分类错误降低近79%。依赖TRoF检测器的稀疏描述也值得注意,它可将分类错误降低超过69%,内存占用比密集解决方案少19倍,并且还可以实现以满足实时要求。