United States Department of Agriculture, Agricultural Research Service, Byron, GA 31008, USA.
Phytopathology. 2010 Oct;100(10):1030-41. doi: 10.1094/PHYTO-08-09-0220.
Comparing treatment effects by hypothesis testing is a common practice in plant pathology. Nearest percent estimates (NPEs) of disease severity were compared with Horsfall-Barratt (H-B) scale data to explore whether there was an effect of assessment method on hypothesis testing. A simulation model based on field-collected data using leaves with disease severity of 0 to 60% was used; the relationship between NPEs and actual severity was linear, a hyperbolic function described the relationship between the standard deviation of the rater mean NPE and actual disease, and a lognormal distribution was assumed to describe the frequency of NPEs of specific actual disease severities by raters. Results of the simulation showed standard deviations of mean NPEs were consistently similar to the original rater standard deviation from the field-collected data; however, the standard deviations of the H-B scale data deviated from that of the original rater standard deviation, particularly at 20 to 50% severity, over which H-B scale grade intervals are widest; thus, it is over this range that differences in hypothesis testing are most likely to occur. To explore this, two normally distributed, hypothetical severity populations were compared using a t test with NPEs and H-B midpoint data. NPE data had a higher probability to reject the null hypothesis (H0) when H0 was false but greater sample size increased the probability to reject H0 for both methods, with the H-B scale data requiring up to a 50% greater sample size to attain the same probability to reject the H0 as NPEs when H0 was false. The increase in sample size resolves the increased sample variance caused by inaccurate individual estimates due to H-B scale midpoint scaling. As expected, various population characteristics influenced the probability to reject H0, including the difference between the two severity distribution means, their variability, and the ability of the raters. Inaccurate raters showed a similar probability to reject H0 when H0 was false using either assessment method but average and accurate raters had a greater probability to reject H0 when H0 was false using NPEs compared with H-B scale data. Accurate raters had, on average, better resolving power for estimating disease compared with that offered by the H-B scale and, therefore, the resulting sample variability was more representative of the population when sample size was limiting. Thus, there are various circumstances under which H-B scale data has a greater risk of failing to reject H0 when H0 is false (a type II error) compared with NPEs.
在植物病理学中,通过假设检验比较治疗效果是一种常见的做法。为了探讨评估方法是否会影响假设检验,将疾病严重程度的最近百分比估计值(NPE)与 Horsfall-Barratt(H-B)量表数据进行比较。使用基于田间采集数据的模拟模型,叶片的疾病严重程度为 0 到 60%;NPE 与实际严重程度之间的关系呈线性,评级者平均 NPE 的标准差与实际疾病之间的关系呈双曲线函数,而特定实际疾病严重程度的 NPE 频率则假定为对数正态分布。模拟结果表明,平均 NPE 的标准差始终与田间采集数据中原始评级者的标准差相似;然而,H-B 量表数据的标准差偏离了原始评级者的标准差,尤其是在 20%到 50%的严重程度下,H-B 量表等级间隔最宽;因此,在这个范围内,假设检验最有可能出现差异。为了探讨这一点,使用 t 检验比较了两个正态分布的假设严重程度群体,使用 NPE 和 H-B 中点数据。当 H0 为假时,NPE 数据更有可能拒绝零假设(H0),但更大的样本量增加了两种方法拒绝 H0 的可能性,H-B 量表数据需要增加多达 50%的样本量才能达到与 NPE 相同的拒绝 H0 的概率当 H0 为假时。随着样本量的增加,由于 H-B 量表中点标度导致的个体估计不准确而引起的样本方差增加得到解决。正如预期的那样,各种群体特征会影响拒绝 H0 的概率,包括两个严重程度分布均值之间的差异、它们的变异性以及评级者的能力。当 H0 为假时,不准确的评级者使用任何一种评估方法都有相似的拒绝 H0 的概率,但当 H0 为假时,平均和准确的评级者使用 NPE 比 H-B 量表数据更有可能拒绝 H0。准确的评级者在估计疾病方面的分辨率平均优于 H-B 量表,因此,当样本量有限时,样本的变异性更能代表总体。因此,在各种情况下,与 NPE 相比,H-B 量表数据在 H0 为假时拒绝 H0 的风险更大(II 类错误)。