Bain Max, Nagrani Arsha, Schofield Daniel, Berdugo Sophie, Bessa Joana, Owen Jake, Hockings Kimberley J, Matsuzawa Tetsuro, Hayashi Misato, Biro Dora, Carvalho Susana, Zisserman Andrew
Visual Geometry Group, Department of Engineering Science, University of Oxford, Oxford, UK.
Primate Models for Behavioural Evolution Lab, Institute of Human Sciences, School of Anthropology and Museum Ethnography, University of Oxford, Oxford, UK.
Sci Adv. 2021 Nov 12;7(46):eabi4883. doi: 10.1126/sciadv.abi4883.
Large video datasets of wild animal behavior are crucial to produce longitudinal research and accelerate conservation efforts; however, large-scale behavior analyses continue to be severely constrained by time and resources. We present a deep convolutional neural network approach and fully automated pipeline to detect and track two audiovisually distinctive actions in wild chimpanzees: buttress drumming and nut cracking. Using camera trap and direct video recordings, we train action recognition models using audio and visual signatures of both behaviors, attaining high average precision (buttress drumming: 0.87 and nut cracking: 0.85), and demonstrate the potential for behavioral analysis using the automatically parsed video. Our approach produces the first automated audiovisual action recognition of wild primate behavior, setting a milestone for exploiting large datasets in ethology and conservation.
野生动物行为的大型视频数据集对于开展纵向研究和加快保护工作至关重要;然而,大规模行为分析仍然受到时间和资源的严重限制。我们提出了一种深度卷积神经网络方法和全自动流程,用于检测和跟踪野生黑猩猩的两种在视听方面具有独特特征的行为:支撑鼓击和砸坚果。利用相机陷阱和直接视频记录,我们使用这两种行为的音频和视觉特征训练动作识别模型,获得了较高的平均精度(支撑鼓击:0.87,砸坚果:0.85),并展示了使用自动解析视频进行行为分析的潜力。我们的方法首次实现了对野生灵长类动物行为的自动视听动作识别,为在动物行为学和保护领域利用大型数据集树立了一个里程碑。