U.S. Department of Agriculture, Agricultural Research Service, U.S. Agricultural Research Station, Crop Improvement and Protection Research Unit, Salinas, California, United States of America.
U.S. Department of Agriculture, National Forage Seed Production Research Center, Corvallis, Oregon, United States of America.
PLoS One. 2018 Apr 17;13(4):e0194635. doi: 10.1371/journal.pone.0194635. eCollection 2018.
Visual assessments are used for evaluating the quality of food products, such as fresh-cut lettuce packaged in bags with modified atmosphere. We have compared the accuracy and the reliability of visual evaluations of decay on fresh-cut lettuce performed with experienced and inexperienced raters. In addition, we have analyzed decay data from over 4.5 thousand bags to determine the optimum timing for evaluations to detect differences among accessions. Lin's concordance coefficient (ρc) that takes into consideration both the closeness of the data and the conformance to the identity line showed high repeatability (intra-rater reliability, ρc = 0.97), reproducibility (inter-rater reliability, ρc = 0.92), and accuracy (ρc = 0.96) for experienced raters. Inexperienced raters did not perform as well and their ratings showed decreased repeatability (ρc = 0.93), but even larger reduction in reproducibility (ρc = 0.80) and accuracy (ρc = 0.90). We have detected that 5.3% of ratings were outside of the 95% limits of agreement. These under- or overestimates were predominantly found for bags with intermediate levels of decay, which corresponds to the middle of the rating scale. This occurs because intermediate amounts of decay are more difficult to discriminate than extremes. The frequencies of aberrant ratings for experienced raters ranged from 0.6% to 4.4% (mean = 2.1%), for inexperienced raters the frequencies were substantially higher, ranging from 6.1% to 15.6% (mean = 9.4%). Therefore, we recommend that new raters receive training that includes practical examples in this range of decay, use of standard area diagrams, and continuing interaction with experienced raters (consultation during actual rating). Very high agreement among experienced raters indicate that visual ratings can be successfully used for evaluations of decay, until a more objective, rapid, and affordable method is developed. We recommend evaluating samples at multiple time points until 42 days after processing (about 80% decay on average) and then combining these individual ratings into the area under the decay progress stairs (AUDePS) score. Applying this approach, experienced evaluators can accurately detect difference among lettuce accessions and identify lettuce cultivars with reduced decay.
视觉评估用于评估食品产品的质量,例如用改良气氛包装的袋装新鲜切割生菜。我们比较了经验丰富和缺乏经验的评估者对新鲜切割生菜腐烂进行视觉评估的准确性和可靠性。此外,我们分析了超过 4500 袋的腐烂数据,以确定评估的最佳时间,以检测品种之间的差异。考虑到数据的接近程度和与身份线的一致性,林氏一致性系数(ρc)显示出高度的可重复性(内部评估者可靠性,ρc=0.97)、再现性(外部评估者可靠性,ρc=0.92)和准确性(ρc=0.96)。缺乏经验的评估者表现不佳,他们的评分重复性降低(ρc=0.93),但再现性(ρc=0.80)和准确性(ρc=0.90)降低更大。我们发现 5.3%的评分超出了 95%的一致性界限。这些低估或高估主要出现在腐烂程度中等的袋子中,这对应于评分量表的中间部分。这是因为与极端情况相比,腐烂程度中等更难区分。经验丰富的评估者的异常评分频率范围为 0.6%至 4.4%(平均值=2.1%),缺乏经验的评估者的频率明显更高,范围为 6.1%至 15.6%(平均值=9.4%)。因此,我们建议新的评估者接受培训,包括在这个腐烂范围内的实际例子、使用标准区域图以及与经验丰富的评估者持续互动(在实际评估期间咨询)。经验丰富的评估者之间非常高的一致性表明,在开发出更客观、快速和经济实惠的方法之前,可以成功地使用视觉评估来评估腐烂。我们建议在加工后 42 天内(平均腐烂约 80%)多次评估样本,然后将这些单独的评分组合成腐烂进度阶梯下的面积(AUDePS)评分。采用这种方法,经验丰富的评估者可以准确地检测生菜品种之间的差异,并识别腐烂程度降低的生菜品种。