Arent Ilja, Schmidt Florian P, Botsch Mario, Dürr Volker
Biological Cybernetics, Faculty of Biology, Bielefeld University, Bielefeld, Germany.
Center for Cognitive Interaction Technology, Bielefeld University, Bielefeld, Germany.
Front Behav Neurosci. 2021 Apr 22;15:637806. doi: 10.3389/fnbeh.2021.637806. eCollection 2021.
Motion capture of unrestrained moving animals is a major analytic tool in neuroethology and behavioral physiology. At present, several motion capture methodologies have been developed, all of which have particular limitations regarding experimental application. Whereas marker-based motion capture systems are very robust and easily adjusted to suit different setups, tracked species, or body parts, they cannot be applied in experimental situations where markers obstruct the natural behavior (e.g., when tracking delicate, elastic, and/or sensitive body structures). On the other hand, marker-less motion capture systems typically require setup- and animal-specific adjustments, for example by means of tailored image processing, decision heuristics, and/or machine learning of specific sample data. Among the latter, deep-learning approaches have become very popular because of their applicability to virtually any sample of video data. Nevertheless, concise evaluation of their training requirements has rarely been done, particularly with regard to the transfer of trained networks from one application to another. To address this issue, the present study uses insect locomotion as a showcase example for systematic evaluation of variation and augmentation of the training data. For that, we use artificially generated video sequences with known combinations of observed, real animal postures and randomized body position, orientation, and size. Moreover, we evaluate the generalization ability of networks that have been pre-trained on synthetic videos to video recordings of real walking insects, and estimate the benefit in terms of reduced requirement for manual annotation. We show that tracking performance is affected only little by scaling factors ranging from 0.5 to 1.5. As expected from convolutional networks, the translation of the animal has no effect. On the other hand, we show that sufficient variation of rotation in the training data is essential for performance, and make concise suggestions about how much variation is required. Our results on transfer from synthetic to real videos show that pre-training reduces the amount of necessary manual annotation by about 50%.
对不受约束的活动动物进行运动捕捉是神经行为学和行为生理学中的一种主要分析工具。目前,已经开发了几种运动捕捉方法,所有这些方法在实验应用方面都有特定的局限性。基于标记的运动捕捉系统非常稳健,并且很容易调整以适应不同的设置、被跟踪物种或身体部位,但它们不能应用于标记会妨碍自然行为的实验情况(例如,当跟踪精细、有弹性和/或敏感的身体结构时)。另一方面,无标记运动捕捉系统通常需要针对设置和动物进行特定调整,例如通过定制图像处理、决策启发式方法和/或对特定样本数据的机器学习。在后者中,深度学习方法因其适用于几乎任何视频数据样本而变得非常流行。然而,很少有人对其训练要求进行简洁的评估,特别是关于将训练好的网络从一种应用转移到另一种应用方面。为了解决这个问题,本研究以昆虫运动为例,对训练数据的变化和增强进行系统评估。为此,我们使用人工生成的视频序列,这些序列具有观察到的真实动物姿势以及随机的身体位置、方向和大小的已知组合。此外,我们评估了在合成视频上预训练的网络对真实行走昆虫视频记录的泛化能力,并估计了在减少手动标注需求方面的益处。我们表明,跟踪性能仅受到0.5到1.5范围内缩放因子的微小影响。正如卷积网络所预期的那样,动物的平移没有影响。另一方面,我们表明训练数据中足够的旋转变化对于性能至关重要,并就需要多少变化提出了简洁的建议。我们从合成视频到真实视频的转移结果表明,预训练将所需的手动标注量减少了约50%。