Cocatre-Zilgien J H, Delcomyn F
Department of Entomology, University of Illinois, Urbana 61801.
Biol Cybern. 1988;59(6):367-77. doi: 10.1007/BF00336110.
In order to determine which statistical tests can validly be applied to data that describe a temporal relationship between two or more repetitive movements by an animal, we evaluated empirically seven two-sample tests that seemed potentially useful: Student's t test, the Watson Williams test for means, the variance-ratio F test, the Watson Williams test for the concentration parameter k, the Wallraff test, the Mann Whitney test and the Watson U2 test. Evaluations were carried out on the timing (phases) of bursts of muscular activity in one leg relative to those in another during free walking in cockroaches. Each statistical test was evaluated by dividing randomly a single parent set of data into two subsets, each subset containing about half the original data set. This division was repeated 400 times, thus generating 400 different pairs of subsets. Each statistical test was used separately on the pairs of subsets to test the null hypothesis that the two samples of each pair came from the same population; this procedure generated 400 statistics for each test, one for each pair of subsets. An estimate of the reliability of each statistical test was obtained by comparing the number of times the test actually indicated a significant difference between subsets to the number of times it might be expected to do so out (20 out of 400 when tested at the 5% level of significance). This procedure was repeated on ten different sets of data. The outcome of the evaluation suggested that, from an empirical point of view, Student's t, the Mann Whitney, the Wallraff and the Watson U2 tests may be useful in assessing differences among the data we analyzed. The variance-ratio F test and the Watson Williams test for the concentration parameter k were clearly not usable. The Watson Williams test for means might be useful in some circumstances. Performing an arcsine transformation of the data did not significantly alter these results. Possible causes of the inapplicability of some of these tests to phase data are discussed.
为了确定哪些统计检验可有效地应用于描述动物两个或多个重复运动之间时间关系的数据,我们通过实证评估了七种似乎可能有用的双样本检验:学生t检验、均值的沃森-威廉姆斯检验、方差比F检验、浓度参数k的沃森-威廉姆斯检验、瓦尔拉夫检验、曼-惠特尼检验和沃森U2检验。评估是在蟑螂自由行走时,一条腿的肌肉活动爆发时间(阶段)相对于另一条腿的情况进行的。通过将单个父数据集随机分为两个子集来评估每种统计检验,每个子集包含大约一半的原始数据集。这种划分重复400次,从而生成400对不同的子集。对每对子集分别使用每种统计检验来检验零假设,即每对中的两个样本来自同一总体;该过程为每种检验生成400个统计量,每对子集一个。通过比较检验实际表明子集之间存在显著差异的次数与预期可能这样做的次数(在5%显著性水平检验时为400次中的20次),获得每种统计检验可靠性的估计值。在十组不同的数据上重复此过程。评估结果表明,从实证角度来看,学生t检验、曼-惠特尼检验、瓦尔拉夫检验和沃森U2检验可能有助于评估我们分析的数据之间的差异。方差比F检验和浓度参数k的沃森-威廉姆斯检验显然不可用。均值的沃森-威廉姆斯检验在某些情况下可能有用。对数据进行反正弦变换并没有显著改变这些结果。讨论了其中一些检验不适用于阶段数据的可能原因。