De Coninck Arne, Fostier Jan, Maenhout Steven, De Baets Bernard
Research Unit Knowledge-based Systems KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, B-9000 Ghent, Belgium
IBCN, Internet Based Communication Networks and Services Research Unit Department of Information Technology, Ghent University-iMinds, B-9000 Ghent, Belgium.
Genetics. 2014 Jul;197(3):813-22. doi: 10.1534/genetics.114.163683. Epub 2014 Apr 15.
In genomic prediction, common analysis methods rely on a linear mixed-model framework to estimate SNP marker effects and breeding values of animals or plants. Ridge regression-best linear unbiased prediction (RR-BLUP) is based on the assumptions that SNP marker effects are normally distributed, are uncorrelated, and have equal variances. We propose DAIRRy-BLUP, a parallel, Distributed-memory RR-BLUP implementation, based on single-trait observations ( Y: ), that uses the Average Information algorithm for restricted maximum-likelihood estimation of the variance components. The goal of DAIRRy-BLUP is to enable the analysis of large-scale data sets to provide more accurate estimates of marker effects and breeding values. A distributed-memory framework is required since the dimensionality of the problem, determined by the number of SNP markers, can become too large to be analyzed by a single computing node. Initial results show that DAIRRy-BLUP enables the analysis of very large-scale data sets (up to 1,000,000 individuals and 360,000 SNPs) and indicate that increasing the number of phenotypic and genotypic records has a more significant effect on the prediction accuracy than increasing the density of SNP arrays.
在基因组预测中,常用的分析方法依赖于线性混合模型框架来估计动植物的单核苷酸多态性(SNP)标记效应和育种值。岭回归最佳线性无偏预测(RR-BLUP)基于以下假设:SNP标记效应呈正态分布、不相关且具有相等的方差。我们提出了DAIRRy-BLUP,这是一种基于单性状观测值(Y:)的并行分布式内存RR-BLUP实现方法,它使用平均信息算法进行方差分量的限制最大似然估计。DAIRRy-BLUP的目标是能够分析大规模数据集,以提供更准确的标记效应和育种值估计。由于由SNP标记数量决定的问题维度可能变得太大而无法由单个计算节点进行分析,因此需要一个分布式内存框架。初步结果表明,DAIRRy-BLUP能够分析非常大规模的数据集(多达100万个个体和36万个SNP),并表明增加表型和基因型记录的数量对预测准确性的影响比增加SNP阵列的密度更为显著。