Salk Institute for Biological Studies, La Jolla, California, United States of America.
Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, California, United States of America.
PLoS One. 2020 Dec 16;15(12):e0241696. doi: 10.1371/journal.pone.0241696. eCollection 2020.
Automated quantification of behavior is increasingly prevalent in neuroscience research. Human judgments can influence machine-learning-based behavior classification at multiple steps in the process, for both supervised and unsupervised approaches. Such steps include the design of the algorithm for machine learning, the methods used for animal tracking, the choice of training images, and the benchmarking of classification outcomes. However, how these design choices contribute to the interpretation of automated behavioral classifications has not been extensively characterized. Here, we quantify the effects of experimenter choices on the outputs of automated classifiers of Drosophila social behaviors. Drosophila behaviors contain a considerable degree of variability, which was reflected in the confidence levels associated with both human and computer classifications. We found that a diversity of sex combinations and tracking features was important for robust performance of the automated classifiers. In particular, features concerning the relative position of flies contained useful information for training a machine-learning algorithm. These observations shed light on the importance of human influence on tracking algorithms, the selection of training images, and the quality of annotated sample images used to benchmark the performance of a classifier (the 'ground truth'). Evaluation of these factors is necessary for researchers to accurately interpret behavioral data quantified by a machine-learning algorithm and to further improve automated classifications.
自动化行为量化在神经科学研究中越来越普遍。在监督和无监督方法的多个步骤中,人类判断都会影响基于机器学习的行为分类,这些步骤包括机器学习算法的设计、动物跟踪方法、训练图像的选择以及分类结果的基准测试。然而,这些设计选择如何影响自动化行为分类的解释还没有得到广泛的描述。在这里,我们量化了实验者选择对果蝇社交行为自动分类器输出的影响。果蝇行为包含相当大的可变性,这反映在人类和计算机分类相关的置信水平上。我们发现,多样化的性别组合和跟踪特征对于自动分类器的稳健性能很重要。特别是,有关苍蝇相对位置的特征包含了训练机器学习算法的有用信息。这些观察结果表明了人类对跟踪算法、训练图像的选择以及用于基准测试分类器性能的注释样本图像质量的影响(即“ground truth”)的重要性。评估这些因素对于研究人员准确解释机器学习算法量化的行为数据以及进一步改进自动化分类是必要的。