IEEE Trans Neural Netw Learn Syst. 2022 Aug;33(8):4017-4030. doi: 10.1109/TNNLS.2021.3055548. Epub 2022 Aug 3.
Categorizing aerial photographs with varied weather/lighting conditions and sophisticated geomorphic factors is a key module in autonomous navigation, environmental evaluation, and so on. Previous image recognizers cannot fulfill this task due to three challenges: 1) localizing visually/semantically salient regions within each aerial photograph in a weakly annotated context due to the unaffordable human resources required for pixel-level annotation; 2) aerial photographs are generally with multiple informative attributes (e.g., clarity and reflectivity), and we have to encode them for better aerial photograph modeling; and 3) designing a cross-domain knowledge transferal module to enhance aerial photograph perception since multiresolution aerial photographs are taken asynchronistically and are mutually complementary. To handle the above problems, we propose to optimize aerial photograph's feature learning by leveraging the low-resolution spatial composition to enhance the deep learning of perceptual features with a high resolution. More specifically, we first extract many BING-based object patches (Cheng et al., 2014) from each aerial photograph. A weakly supervised ranking algorithm selects a few semantically salient ones by seamlessly incorporating multiple aerial photograph attributes. Toward an interpretable aerial photograph recognizer indicative to human visual perception, we construct a gaze shifting path (GSP) by linking the top-ranking object patches and, subsequently, derive the deep GSP feature. Finally, a cross-domain multilabel SVM is formulated to categorize each aerial photograph. It leverages the global feature from low-resolution counterparts to optimize the deep GSP feature from a high-resolution aerial photograph. Comparative results on our compiled million-scale aerial photograph set have demonstrated the competitiveness of our approach. Besides, the eye-tracking experiment has shown that our ranking-based GSPs are over 92% consistent with the real human gaze shifting sequences.
对具有不同天气/光照条件和复杂地貌因素的航空照片进行分类,是自主导航、环境评估等领域的关键模块。由于在弱标注环境下对每张航空照片进行像素级标注所需的人力成本过高,以前的图像识别器无法完成这项任务。主要存在以下三个挑战:1)在弱标注环境下,由于无法承受的人力成本,每个航空照片中视觉/语义显著区域的本地化;2)航空照片通常具有多个信息属性(例如清晰度和反射率),我们必须对其进行编码以更好地进行航空照片建模;3)设计跨领域知识迁移模块,以增强航空照片的感知能力,因为多分辨率航空照片是异步拍摄的,并且是相互补充的。为了解决上述问题,我们提出通过利用低分辨率空间构成来优化航空照片的特征学习,以增强对高分辨率感知特征的深度学习。更具体地说,我们首先从每张航空照片中提取许多基于 BING 的目标补丁(Cheng 等人,2014 年)。一个弱监督的排序算法通过无缝整合多个航空照片属性,选择少数语义上显著的目标补丁。为了构建一个对人类视觉感知具有指示意义的可解释的航空照片识别器,我们通过链接顶级目标补丁构建一个注视转移路径(GSP),并从中推导出深度 GSP 特征。最后,构建一个跨领域多标签 SVM 来对每张航空照片进行分类。它利用低分辨率对应物的全局特征来优化来自高分辨率航空照片的深度 GSP 特征。在我们编译的百万规模航空照片集上的比较结果表明了我们方法的竞争力。此外,眼动追踪实验表明,我们基于排序的 GSP 与真实的人类注视转移序列的一致性超过 92%。