Li Yi, Zhang Luming, Shao Lin
IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):3384-3395. doi: 10.1109/TNNLS.2024.3349515. Epub 2025 Feb 6.
There are hundreds of high- and low-altitude earth observation satellites that asynchronously capture massive-scale aerial photographs every day. Generally, high-altitude satellites take low-resolution (LR) aerial pictures, each covering a considerably large area. In contrast, low-altitude satellites capture high-resolution (HR) aerial photos, each depicting a relatively small area. Accurately discovering the semantics of LR aerial photos is an indispensable technique in computer vision. Nevertheless, it is also a challenging task due to: 1) the difficulty to characterize human hierarchical visual perception and 2) the intolerable human resources to label sufficient training data. To handle these problems, a novel cross-resolution perceptual knowledge propagation (CPKP) framework is proposed, focusing on adapting the visual perceptual experiences deeply learned from HR aerial photos to categorize LR ones. Specifically, by mimicking the human vision system, a novel low-rank model is designed to decompose each LR aerial photo into multiple visually/semantically salient foreground regions coupled with the background nonsalient regions. This model can: 1) produce a gaze-shifting path (GSP) simulating human gaze behavior and 2) engineer the deep feature for each GSP. Afterward, a kernel-induced feature selection (FS) algorithm is formulated to obtain a succinct set of deep GSP features discriminative across LR and HR aerial photos. Based on the selected features, the labels from LR and HR aerial photos are collaboratively utilized to train a linear classifier for categorizing LR ones. It is worth emphasizing that, such a CPKP mechanism can effectively optimize the linear classifier training, as labels of HR aerial photos are acquired more conveniently in practice. Comprehensive visualization results and comparative study have validated the superiority of our approach.
有数百颗高、低空地球观测卫星每天异步拍摄大规模航拍照片。一般来说,高空卫星拍摄低分辨率(LR)航拍图片,每张覆盖相当大的区域。相比之下,低空卫星拍摄高分辨率(HR)航拍照片,每张描绘相对较小的区域。准确发现LR航拍照片的语义是计算机视觉中一项不可或缺的技术。然而,由于以下原因,这也是一项具有挑战性的任务:1)难以表征人类分层视觉感知;2)标注足够训练数据所需的人力资源令人难以承受。为了解决这些问题,提出了一种新颖的跨分辨率感知知识传播(CPKP)框架,重点是将从HR航拍照片中深度学习到的视觉感知经验用于对LR照片进行分类。具体而言,通过模仿人类视觉系统,设计了一种新颖的低秩模型,将每张LR航拍照片分解为多个视觉/语义上显著的前景区域以及背景非显著区域。该模型可以:1)生成模拟人类注视行为的注视转移路径(GSP);2)为每个GSP设计深度特征。之后,制定了一种核诱导特征选择(FS)算法,以获得一组简洁的深度GSP特征,这些特征在LR和HR航拍照片之间具有判别性。基于所选特征,协同利用LR和HR航拍照片的标签来训练用于对LR照片进行分类的线性分类器。值得强调 的是,这样的CPKP机制可以有效地优化线性分类器训练,因为在实践中获取HR航拍照片的标签更为方便。综合可视化结果和对比研究验证了我们方法的优越性。