Pook Torsten, Freudenthal Jan, Korte Arthur, Simianer Henner
Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Goettingen, Göttingen, Germany.
Center for Computational and Theoretical Biology, University of Wuerzburg, Wuerzburg, Germany.
Front Genet. 2020 Nov 12;11:561497. doi: 10.3389/fgene.2020.561497. eCollection 2020.
The prediction of breeding values and phenotypes is of central importance for both livestock and crop breeding. In this study, we analyze the use of artificial neural networks (ANN) and, in particular, local convolutional neural networks (LCNN) for genomic prediction, as a region-specific filter corresponds much better with our prior genetic knowledge on the genetic architecture of traits than traditional convolutional neural networks. Model performances are evaluated on a simulated maize data panel ( = 10,000; = 34,595) and real Arabidopsis data ( = 2,039; = 180,000) for a variety of traits based on their predictive ability. The baseline LCNN, containing one local convolutional layer (kernel size: 10) and two fully connected layers with 64 nodes each, is outperforming commonly proposed ANNs (multi layer perceptrons and convolutional neural networks) for basically all considered traits. For traits with high heritability and large training population as present in the simulated data, LCNN are even outperforming state-of-the-art methods like genomic best linear unbiased prediction (GBLUP), Bayesian models and extended GBLUP, indicated by an increase in predictive ability of up to 24%. However, for small training populations, these state-of-the-art methods outperform all considered ANNs. Nevertheless, the LCNN still outperforms all other considered ANNs by around 10%. Minor improvements to the tested baseline network architecture of the LCNN were obtained by increasing the kernel size and of reducing the stride, whereas the number of subsequent fully connected layers and their node sizes had neglectable impact. Although gains in predictive ability were obtained for large scale data sets by using LCNNs, the practical use of ANNs comes with additional problems, such as the need of genotyping all considered individuals, the lack of estimation of heritability and reliability. Furthermore, breeding values are additive by design, whereas ANN-based estimates are not. However, ANNs also comes with new opportunities, as networks can easily be extended to account for additional inputs (omics, weather etc.) and outputs (multi-trait models), and computing time increases linearly with the number of individuals. With advances in high-throughput phenotyping and cheaper genotyping, ANNs can become a valid alternative for genomic prediction.
育种值和表型的预测对家畜和作物育种都至关重要。在本研究中,我们分析了人工神经网络(ANN)的应用,特别是局部卷积神经网络(LCNN)在基因组预测中的应用,因为与传统卷积神经网络相比,区域特异性滤波器与我们对性状遗传结构的先验遗传知识更为契合。基于预测能力,在模拟玉米数据集(n = 10,000;p = 34,595)和真实拟南芥数据(n = 2,039;p = 180,000)上对多种性状评估模型性能。包含一个局部卷积层(内核大小:10)和两个各有64个节点的全连接层的基线LCNN,在基本上所有考虑的性状上都优于通常提出的人工神经网络(多层感知器和卷积神经网络)。对于模拟数据中存在的高遗传力和大训练群体的性状,LCNN甚至优于基因组最佳线性无偏预测(GBLUP)、贝叶斯模型和扩展GBLUP等现有方法,预测能力提高高达24%表明了这一点。然而,对于小训练群体,这些现有方法优于所有考虑的人工神经网络。尽管如此,LCNN仍比所有其他考虑的人工神经网络高出约10%。通过增加内核大小和减小步长,对测试的LCNN基线网络架构有微小改进,而后续全连接层的数量及其节点大小的影响可忽略不计。虽然使用LCNN在大规模数据集上获得了预测能力的提升,但人工神经网络的实际应用还存在其他问题,如需要对所有考虑的个体进行基因分型、缺乏遗传力和可靠性的估计。此外,育种值按设计是可加性的,而基于人工神经网络的估计则不是。然而,人工神经网络也带来了新的机遇,因为网络可以轻松扩展以纳入额外输入(组学、天气等)和输出(多性状模型),并且计算时间随个体数量线性增加。随着高通量表型分析和更便宜的基因分型技术的进步,人工神经网络可以成为基因组预测的有效替代方法。