Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, 110012, New Delhi, India.
Sci Rep. 2020 May 21;10(1):8408. doi: 10.1038/s41598-020-65323-3.
It is expected the predictive performance of genomic prediction methods may be adversely affected in the presence of outliers. In agriculture science an outlier may arise due to wrong data imputation, outlying response, and in a series of trials over the time or location. Although several statistical procedures are already there in literature for identification of outlier but identification of true outlier is still a challenge especially in case of high dimensional genomic data. Here we have proposed an efficient approach for detecting outlier in high dimensional genomic data, our approach is p-value based combination methods to produce single p-value for detecting the outliers. Robustness of our approach has been tested using simulated data through the evaluation measures like precision, recall etc. It has been observed that significant improvement in the performance of genomic prediction has been obtained by detecting the outliers and handling them accordingly through our proposed approach using real data.
预计在存在异常值的情况下,基因组预测方法的预测性能可能会受到不利影响。在农业科学中,异常值可能由于错误的数据插补、异常的响应以及随着时间或地点的一系列试验而产生。尽管文献中已经存在几种用于识别异常值的统计程序,但识别真正的异常值仍然是一个挑战,特别是在高维基因组数据的情况下。在这里,我们提出了一种用于检测高维基因组数据中异常值的有效方法,我们的方法是基于 p 值的组合方法,为检测异常值生成单个 p 值。通过使用模拟数据评估措施(如精度、召回率等)来测试我们方法的稳健性。通过使用真实数据,通过检测异常值并通过我们提出的方法进行相应处理,观察到基因组预测性能得到了显著提高。