Qu Yuanshuo, Kne Len, Graham Steve, Watkins Eric, Morris Kevin
National Turfgrass Evaluation Program, Beltsville, MD, United States.
U-Spatial, University of Minnesota, Minneapolis, MN, United States.
Front Plant Sci. 2023 Jul 6;14:1135918. doi: 10.3389/fpls.2023.1135918. eCollection 2023.
Traditional evaluation procedure in National Turfgrass Evaluation Program (NTEP) relies on visually assessing replicated turf plots at multiple testing locations. This process yields ordinal data; however, statistical models that falsely assume these to be interval or ratio data have almost exclusively been applied in the subsequent analysis. This practice raises concerns about procedural subjectivity, preventing objective comparisons of cultivars across different test locations. It may also lead to serious errors, such as increased false alarms, failures to detect effects, and even inversions of differences among groups.
We reviewed this problem, identified sources of subjectivity, and presented a model-based approach to minimize subjectivity, allowing objective comparisons of cultivars across different locations and better monitoring of the evaluation procedure. We demonstrate how to fit the described model in a Bayesian framework with Stan, using datasets on overall turf quality ratings from the 2017 NTEP Kentucky bluegrass trials at seven testing locations.
Compared with the existing method, ours allows the estimation of additional parameters, i.e., category thresholds, rating severity, and within-field spatial variations, and provides better separation of cultivar means and more realistic standard deviations.
To implement the proposed model, additional information on rater identification, trial layout, rating date is needed. Given the model assumptions, we recommend small trials to reduce rater fatigue. For large trials, ratings can be conducted for each replication on multiple occasions instead of all at once. To minimize subjectivity, multiple raters are required. We also proposed new ideas on temporal analysis, incorporating existing knowledge of turfgrass.
国家草坪草评价计划(NTEP)中的传统评价程序依赖于在多个测试地点对重复的草坪地块进行目视评估。这一过程产生的是有序数据;然而,在后续分析中几乎完全应用了错误地将这些数据假定为区间或比率数据的统计模型。这种做法引发了对程序主观性的担忧,阻碍了不同测试地点品种之间的客观比较。它还可能导致严重错误,如误报增加、未能检测到效应,甚至组间差异的颠倒。
我们审视了这个问题,确定了主观性的来源,并提出了一种基于模型的方法来尽量减少主观性,从而实现不同地点品种之间的客观比较,并更好地监测评价程序。我们展示了如何使用来自2017年NTEP肯塔基蓝草在七个测试地点试验的总体草坪质量评级数据集,在贝叶斯框架下用Stan拟合所描述的模型。
与现有方法相比,我们的方法允许估计额外的参数,即类别阈值、评级严重性和田间空间变异,并能更好地分离品种均值和给出更现实的标准差。
要实施所提出的模型,需要关于评分者识别、试验布局、评级日期的额外信息。鉴于模型假设,我们建议进行小规模试验以减少评分者疲劳。对于大型试验,可以对每个重复进行多次评级,而不是一次性完成所有评级。为了尽量减少主观性,需要多个评分者。我们还提出了关于时间分析的新想法,纳入了草坪草的现有知识。