Jubair Sheikh, Tucker James R, Henderson Nathan, Hiebert Colin W, Badea Ana, Domaratzki Michael, Fernando W G Dilantha
Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada.
Department of Plant Science, University of Manitoba, Winnipeg, MB, Canada.
Front Plant Sci. 2021 Dec 16;12:761402. doi: 10.3389/fpls.2021.761402. eCollection 2021.
Fusarium head blight (FHB) incited by Schwabe is a devastating disease of barley and other cereal crops worldwide. Fusarium head blight is associated with trichothecene mycotoxins such as deoxynivalenol (DON), which contaminates grains, making them unfit for malting or animal feed industries. While genetically resistant cultivars offer the best economic and environmentally responsible means to mitigate disease, parent lines with adequate resistance are limited in barley. Resistance breeding based upon quantitative genetic gains has been slow to date, due to intensive labor requirements of disease nurseries. The production of a high-throughput genome-wide molecular marker assembly for barley permits use in development of genomic prediction models for traits of economic importance to this crop. A diverse panel consisting of 400 two-row spring barley lines was assembled to focus on Canadian barley breeding programs. The panel was evaluated for FHB and DON content in three environments and over 2 years. Moreover, it was genotyped using an Illumina Infinium High-Throughput Screening (HTS) iSelect custom beadchip array of single nucleotide polymorphic molecular markers (50 K SNP), where over 23 K molecular markers were polymorphic. Genomic prediction has been demonstrated to successfully reduce FHB and DON content in cereals using various statistical models. Herein, we have studied an alternative method based on machine learning and compare it with a statistical approach. The bi-allelic SNPs represented pairs of alleles and were encoded in two ways: as categorical (-1, 0, 1) or using Hardy-Weinberg probability frequencies. This was followed by selecting essential genomic markers for phenotype prediction. Subsequently, a Transformer-based deep learning algorithm was applied to predict FHB and DON. Apart from the Transformer method, a Residual Fully Connected Neural Network (RFCNN) was also applied. Pearson correlation coefficients were calculated to compare true vs. predicted outputs. Models which included all markers generally showed marginal improvement in prediction. Hardy-Weinberg encoding generally improved correlation for FHB (6.9%) and DON (9.6%) for the Transformer network. This study suggests the potential of the Transformer based method as an alternative to the popular BLUP model for genomic prediction of complex traits such as FHB or DON, having performed equally or better than existing machine learning and statistical methods.
由施瓦贝引发的小麦赤霉病(FHB)是全球大麦和其他谷类作物的一种毁灭性病害。小麦赤霉病与诸如脱氧雪腐镰刀菌烯醇(DON)等单端孢霉烯族霉菌毒素有关,这些毒素会污染谷物,使其不适用于麦芽制造或动物饲料行业。虽然具有遗传抗性的品种提供了减轻病害的最佳经济且对环境负责的方法,但大麦中具有足够抗性的亲本系有限。由于病害苗圃对劳动力要求较高,基于数量遗传增益的抗性育种至今进展缓慢。为大麦构建的全基因组高通量分子标记组合可用于开发针对该作物具有经济重要性的性状的基因组预测模型。组建了一个由400个两行春大麦品系组成的多样化群体,以聚焦加拿大的大麦育种计划。该群体在三种环境下历经两年对小麦赤霉病和DON含量进行了评估。此外,使用Illumina Infinium高通量筛选(HTS)iSelect单核苷酸多态性分子标记(50K SNP)定制芯片对其进行基因分型,其中超过23K个分子标记具有多态性。基因组预测已被证明可使用各种统计模型成功降低谷物中的小麦赤霉病和DON含量。在此,我们研究了一种基于机器学习的替代方法,并将其与一种统计方法进行比较。双等位基因SNP代表等位基因对,并以两种方式进行编码:作为分类变量(-1、0、1)或使用哈迪-温伯格概率频率。随后,选择用于表型预测的关键基因组标记。接着,应用基于Transformer的深度学习算法来预测小麦赤霉病和DON。除了Transformer方法外,还应用了残差全连接神经网络(RFCNN)。计算皮尔逊相关系数以比较真实输出与预测输出。包含所有标记的模型通常在预测方面显示出微小的改进。哈迪-温伯格编码通常使Transformer网络对小麦赤霉病(6.9%)和DON(9.6%)的相关性得到改善。本研究表明,基于Transformer的方法有潜力作为一种替代流行的BLUP模型用于对诸如小麦赤霉病或DON等复杂性状进行基因组预测,其表现与现有机器学习和统计方法相当或更优。