Huo Shoujun, Sun Yue, Guo Qinghua, Tan Tao, Bolhuis J Elizabeth, Bijma Piter, de With Peter H N
Department of Electrical Engineering, Eindhoven University of Technology, 5612 AP Eindhoven, The Netherlands.
Department of Mathematics and Computer Science, Eindhoven University of Technology, 5612 AZ Eindhoven, The Netherlands.
Foods. 2022 Dec 23;12(1):84. doi: 10.3390/foods12010084.
In livestock breeding, continuous and objective monitoring of animals is manually unfeasible due to the large scale of breeding and expensive labour. Computer vision technology can generate accurate and real-time individual animal or animal group information from video surveillance. However, the frequent occlusion between animals and changes in appearance features caused by varying lighting conditions makes single-camera systems less attractive. We propose a double-camera system and image registration algorithms to spatially fuse the information from different viewpoints to solve these issues. This paper presents a deformable learning-based registration framework, where the input image pairs are initially linearly pre-registered. Then, an unsupervised convolutional neural network is employed to fit the mapping from one view to another, using a large number of unlabelled samples for training. The learned parameters are then used in a semi-supervised network and fine-tuned with a small number of manually annotated landmarks. The actual pixel displacement error is introduced as a complement to an image similarity measure. The performance of the proposed fine-tuned method is evaluated on real farming datasets and demonstrates significant improvement in lowering the registration errors than commonly used feature-based and intensity-based methods. This approach also reduces the registration time of an unseen image pair to less than 0.5 s. The proposed method provides a high-quality reference processing step for improving subsequent tasks such as multi-object tracking and behaviour recognition of animals for further analysis.
在畜牧养殖中,由于养殖规模大且劳动力成本高,对动物进行持续且客观的人工监测是不可行的。计算机视觉技术可以从视频监控中生成准确且实时的个体动物或动物群体信息。然而,动物之间频繁的遮挡以及光照条件变化导致的外观特征改变,使得单摄像头系统吸引力降低。我们提出了一种双摄像头系统和图像配准算法,通过空间融合来自不同视角的信息来解决这些问题。本文提出了一种基于可变形学习的配准框架,其中输入的图像对首先进行线性预配准。然后,使用大量未标记样本进行训练,采用无监督卷积神经网络来拟合从一个视角到另一个视角的映射。接着,将学习到的参数用于半监督网络,并使用少量人工标注的地标进行微调。引入实际像素位移误差作为图像相似性度量的补充。在真实养殖数据集上评估了所提出的微调方法的性能,结果表明与常用的基于特征和基于强度的方法相比,在降低配准误差方面有显著改进。这种方法还将未见过的图像对的配准时间减少到小于0.5秒。所提出的方法为改进后续任务(如动物的多目标跟踪和行为识别以供进一步分析)提供了高质量的参考处理步骤。