Tang Dongyang, Wang Shang
College of Physical Education, Wuhan Sports University, Wuhan, 430079, Hubei, China.
College of Computer Sciences, Beijing Technology and Business University, Beijing, China.
Sci Rep. 2025 May 26;15(1):18361. doi: 10.1038/s41598-025-02678-5.
We develop a novel computational model to mimic photographers' observation techniques for scene decomposition. Central to our model is a hierarchical structure designed to capture human gaze dynamics accurately using the Binarized Normed Gradients (BING) objectness metric for identifying meaningful scene patches. We introduce a strategy called Locality-preserved and Observer-like Active Learning (LOAL) that constructs gaze shift paths (GSP) incrementally, allowing user interaction in the feature selection process. The GSPs are processed through a multi-layer aggregating algorithm, producing deep feature representations encoded into a Gaussian mixture model (GMM), which underpins our image retargeting approach. Our empirical analyses, supported by a user study, show that our method outperforms comparable techniques significantly, achieving a precision rate 3.2% higher than the second-best performer while halving the testing time. This streamlined approach blends aesthetics with algorithmic efficiency, enhancing AI-driven scene analysis.
我们开发了一种新颖的计算模型,以模仿摄影师用于场景分解的观察技术。我们模型的核心是一种层次结构,旨在使用二值化规范梯度(BING)目标度量准确捕捉人类注视动态,以识别有意义的场景块。我们引入了一种称为局部性保留和观察者式主动学习(LOAL)的策略,该策略逐步构建注视转移路径(GSP),允许用户在特征选择过程中进行交互。GSP通过多层聚合算法进行处理,生成编码为高斯混合模型(GMM)的深度特征表示,这是我们图像重定目标方法的基础。我们的实证分析在用户研究的支持下表明,我们的方法明显优于可比技术,精度率比第二好的方法高3.2%,同时测试时间减半。这种简化的方法将美学与算法效率相结合,增强了人工智能驱动的场景分析。