Department of Econometrics and Business Statistics, Monash University, Clayton, Australia.
School of Mathematics and Statistics, The University of Sydney, Camperdown, Australia.
Biometrics. 2020 Dec;76(4):1374-1382. doi: 10.1111/biom.13216. Epub 2020 Jan 30.
The aim of plant breeding trials is often to identify crop variety that are well adapt to target environments. These varieties are identified through genomic prediction from the analysis of multi-environmental field trial (MET) using linear mixed models. The occurrence of outliers in MET is common and known to adversely impact the accuracy of genomic prediction yet the detection of outliers are often neglected. A number of reasons stand for this-first, complex data such as a MET give rise to distinct levels of residuals (eg, at a trial level or individual observation level). This complexity offers additional challenges for an outlier detection method. Second, many linear mixed model software packages that cater for complex variance structures needed in the analysis of MET are not well streamlined for diagnostics by practitioners. We demonstrate outlier detection methods that are simple to implement in any linear mixed model software packages and computationally fast. Although these methods are not optimal methods in outlier detection, they offer practical value for ease of application in the analysis pipeline of regularly collected data. These are demonstrated using simulation based on two real bread wheat yield METs. In particular, models that consider analysis of yield trials either independently or jointly (thus borrowing strength across trials) are considered. Case studies are presented to highlight benefit of joint analysis for outlier detection.
植物育种试验的目的通常是识别适应目标环境的作物品种。这些品种是通过使用线性混合模型对多环境田间试验(MET)的分析进行基因组预测来确定的。MET 中异常值的出现很常见,并且已知会对基因组预测的准确性产生不利影响,但异常值的检测通常被忽视。有几个原因可以解释这一点——首先,复杂的数据,如 MET,会产生不同水平的残差(例如,在试验水平或个体观察水平)。这种复杂性为异常值检测方法带来了额外的挑战。其次,许多满足 MET 分析所需复杂方差结构的线性混合模型软件包,对于从业者的诊断功能并不完善。我们展示了一些异常值检测方法,这些方法在任何线性混合模型软件包中都易于实现,并且计算速度很快。尽管这些方法不是异常值检测的最优方法,但它们在经常收集的数据的分析管道中具有易于应用的实际价值。我们使用基于两个真实小麦产量 MET 的模拟来演示这些方法。特别考虑了分别或联合考虑产量试验分析的模型(从而在试验之间借用强度)。案例研究突出了联合分析在异常值检测中的益处。