Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.
Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell'Università 16, 35020, Legnaro, Italy.
Genet Sel Evol. 2021 Mar 16;53(1):29. doi: 10.1186/s12711-021-00620-7.
Over the past decade, Fourier transform infrared (FTIR) spectroscopy has been used to predict novel milk protein phenotypes. Genomic data might help predict these phenotypes when integrated with milk FTIR spectra. The objective of this study was to investigate prediction accuracy for milk protein phenotypes when heterogeneous on-farm, genomic, and pedigree data were integrated with the spectra. To this end, we used the records of 966 Italian Brown Swiss cows with milk FTIR spectra, on-farm information, medium-density genetic markers, and pedigree data. True and total whey protein, and five casein, and two whey protein traits were analyzed. Multiple kernel learning constructed from spectral and genomic (pedigree) relationship matrices and multilayer BayesB assigning separate priors for FTIR and markers were benchmarked against a baseline partial least squares (PLS) regression. Seven combinations of covariates were considered, and their predictive abilities were evaluated by repeated random sub-sampling and herd cross-validations (CV).
Addition of the on-farm effects such as herd, days in milk, and parity to spectral data improved predictions as compared to those obtained using the spectra alone. Integrating genomics and/or the top three markers with a large effect further enhanced the predictions. Pedigree data also improved prediction, but to a lesser extent than genomic data. Multiple kernel learning and multilayer BayesB increased predictive performance, whereas PLS did not. Overall, multilayer BayesB provided better predictions than multiple kernel learning, and lower prediction performance was observed in herd CV compared to repeated random sub-sampling CV.
Integration of genomic information with milk FTIR spectral can enhance milk protein trait predictions by 25% and 7% on average for repeated random sub-sampling and herd CV, respectively. Multiple kernel learning and multilayer BayesB outperformed PLS when used to integrate heterogeneous data for phenotypic predictions.
在过去的十年中,傅里叶变换红外(FTIR)光谱已被用于预测新型牛奶蛋白表型。当整合牛奶 FTIR 光谱与基因组数据时,基因组数据可能有助于预测这些表型。本研究的目的是研究当整合异质的农场、基因组和系谱数据与光谱时,对牛奶蛋白表型的预测准确性。为此,我们使用了 966 头意大利棕色瑞士奶牛的牛奶 FTIR 光谱、农场信息、中密度遗传标记和系谱数据的记录。分析了真实和总乳清蛋白以及五种酪蛋白和两种乳清蛋白性状。从光谱和基因组(系谱)关系矩阵构建的多内核学习和为 FTIR 和标记分配单独先验的多层贝叶斯 B 与基线偏最小二乘(PLS)回归进行了基准测试。考虑了七种协变量组合,并通过重复随机子抽样和牛群交叉验证(CV)评估了它们的预测能力。
与仅使用光谱相比,将农场效应(如牛群、泌乳天数和胎次)添加到光谱数据中可提高预测能力。整合基因组和/或前三个具有较大影响的标记可进一步提高预测能力。系谱数据也提高了预测能力,但程度低于基因组数据。多内核学习和多层贝叶斯 B 提高了预测性能,而 PLS 没有。总体而言,多层贝叶斯 B 提供了比多内核学习更好的预测,并且在牛群 CV 中观察到的预测性能低于重复随机子抽样 CV。
将基因组信息与牛奶 FTIR 光谱整合可以将牛奶蛋白性状的预测分别提高 25%和 7%,平均而言,在重复随机子抽样和牛群 CV 中。在用于整合表型预测的异质数据时,多内核学习和多层贝叶斯 B 优于 PLS。