Computational Statistics and Machine Learning, University College London, London, UK.
PLoS One. 2012;7(11):e49120. doi: 10.1371/journal.pone.0049120. Epub 2012 Nov 19.
We propose a new method, based on machine learning techniques, for the analysis of a combination of continuous data from dataloggers and a sampling of contemporaneous behaviour observations. This data combination provides an opportunity for biologists to study behaviour at a previously unknown level of detail and accuracy; however, continuously recorded data are of little use unless the resulting large volumes of raw data can be reliably translated into actual behaviour. We address this problem by applying a Support Vector Machine and a Hidden-Markov Model that allows us to classify an animal's behaviour using a small set of field observations to calibrate continuously recorded activity data. Such classified data can be applied quantitatively to the behaviour of animals over extended periods and at times during which observation is difficult or impossible. We demonstrate the usefulness of the method by applying it to data from six cheetah (Acinonyx jubatus) in the Okavango Delta, Botswana. Cumulative activity data scores were recorded every five minutes by accelerometers embedded in GPS radio-collars for around one year on average. Direct behaviour sampling of each of the six cheetah were collected in the field for comparatively short periods. Using this approach we are able to classify each five minute activity score into a set of three key behaviour (feeding, mobile and stationary), creating a continuous behavioural sequence for the entire period for which the collars were deployed. Evaluation of our classifier with cross-validation shows the accuracy to be 83%-94%, but that the accuracy for individual classes is reduced with decreasing sample size of direct observations. We demonstrate how these processed data can be used to study behaviour identifying seasonal and gender differences in daily activity and feeding times. Results given here are unlike any that could be obtained using traditional approaches in both accuracy and detail.
我们提出了一种新方法,基于机器学习技术,用于分析来自数据记录器的连续数据和同时进行的行为观察的抽样。这种数据组合为生物学家提供了一个机会,可以以前所未有的细节和准确性研究行为;然而,除非连续记录的数据能够可靠地转化为实际行为,否则这些大量的原始数据几乎没有用处。我们通过应用支持向量机和隐马尔可夫模型来解决这个问题,该模型允许我们使用一小部分现场观察来校准连续记录的活动数据,从而对动物的行为进行分类。这种分类数据可以在较长的时间段内应用于动物的行为,并在观察困难或不可能的时间进行应用。我们通过将该方法应用于博茨瓦纳奥卡万戈三角洲的六只猎豹(Acinonyx jubatus)的数据来证明该方法的有用性。平均而言,GPS 无线电项圈中嵌入的加速度计每五分钟记录一次活动数据得分,持续约一年。直接对六只猎豹中的每一只进行行为抽样,在野外进行了相对较短的时间。使用这种方法,我们能够将每五分钟的活动得分分类为一组三个关键行为(进食、移动和静止),为项圈部署的整个时间段创建一个连续的行为序列。使用交叉验证评估我们的分类器的准确性为 83%-94%,但随着直接观察样本量的减少,各个类别的准确性会降低。我们展示了如何使用这些处理后的数据来研究行为,确定每日活动和进食时间的季节性和性别差异。与传统方法相比,这里给出的结果在准确性和细节方面都有所不同。