Saravanan Varun, Berman Gordon J, Sober Samuel J
Neuroscience Graduate Program, Graduate Division of Biological and Biomedical Sciences, Laney Graduate School, Emory University, 30322.
Department of Biology, Emory University, 30322.
Neuron Behav Data Anal Theory. 2020;3(5). Epub 2020 Jul 21.
A common feature in many neuroscience datasets is the presence of hierarchical data structures, most commonly recording the activity of multiple neurons in multiple animals across multiple trials. Accordingly, the measurements constituting the dataset are not independent, even though the traditional statistical analyses often applied in such cases (e.g., Student's t-test) treat them as such. The hierarchical bootstrap has been shown to be an effective tool to accurately analyze such data and while it has been used extensively in the statistical literature, its use is not widespread in neuroscience - despite the ubiquity of hierarchical datasets. In this paper, we illustrate the intuitiveness and utility of this approach to analyze hierarchically nested datasets. We use simulated neural data to show that traditional statistical tests can result in a false positive rate of over 45%, even if the Type-I error rate is set at 5%. While summarizing data across non-independent points (or lower levels) can potentially fix this problem, this approach greatly reduces the statistical power of the analysis. The hierarchical bootstrap, when applied sequentially over the levels of the hierarchical structure, keeps the Type-I error rate within the intended bound and retains more statistical power than summarizing methods. We conclude by demonstrating the effectiveness of the method in two real-world examples, first analyzing singing data in male Bengalese finches ( var. ) and second quantifying changes in behavior under optogenetic control in flies ().
许多神经科学数据集中的一个常见特征是存在分层数据结构,最常见的是记录多只动物在多个试验中的多个神经元的活动。因此,构成数据集的测量值并非相互独立,尽管在此类情况下经常应用的传统统计分析(例如,学生t检验)将它们视为相互独立。分层自举法已被证明是准确分析此类数据的有效工具,虽然它在统计文献中已被广泛使用,但在神经科学领域却并不普遍——尽管分层数据集无处不在。在本文中,我们阐述了这种方法在分析分层嵌套数据集时的直观性和实用性。我们使用模拟神经数据表明,即使将I型错误率设定为5%,传统统计检验仍可能导致超过45%的假阳性率。虽然跨非独立点(或较低层次)汇总数据可能会解决这个问题,但这种方法会大大降低分析的统计效力。当在分层结构的各层次上依次应用分层自举法时,它能将I型错误率控制在预期范围内,并且比汇总方法保留更多的统计效力。我们通过在两个实际例子中展示该方法的有效性来得出结论,第一个例子是分析雄性 Bengalese 雀(变种)的鸣叫数据,第二个例子是量化果蝇在光遗传学控制下行为的变化。