Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.
Theor Appl Genet. 2012 Mar;124(5):825-33. doi: 10.1007/s00122-011-1747-9. Epub 2011 Nov 19.
The performance of hybrids can be predicted with gene expression data from their parental inbred lines. Implementing such prediction approaches in breeding programs promises to increase the efficiency of hybrid breeding. The objectives of our study were to compare the accuracy of prediction models employing multiple linear regression (MLR), partial least squares regression (PLS), support vector machine regression (SVM), and transcriptome-based distances (D(B)). For a factorial of 7 flint and 14 dent maize lines, the grain yield of the hybrids was assessed and the gene expression of the parental lines was profiled with a 56k microarray. The accuracy of the prediction models was measured by the correlation between predicted and observed yield employing two cross-validation schemes. The first modeled the prediction of hybrids when testcross data are available for both parental lines (type 2 hybrids), and the second modeled the prediction of hybrids when no testcross data for the parental lines were available (type 0 hybrids). MLR, SVM, and PLS resulted in a high correlation between predicted and observed yield for type 2 hybrids, whereas for type 0 hybrids D(B) had greater prediction accuracy. The regression methods were robust to the choice of the set of profiled genes and required only a few hundred genes. In contrast, for an accurate hybrid prediction with D(B), 1,000-1,500 genes were required, and the prediction accuracy depended strongly on the set of profiled genes. We conclude that for prediction within one set of genetic material MLR is a promising approach, and for transferring prediction models from one set of genetic material to a related one, the transcriptome-based distance D(B) is most promising.
杂种的表现可以通过其亲本自交系的基因表达数据来预测。在育种计划中实施这种预测方法有望提高杂种育种的效率。我们的研究目的是比较采用多元线性回归(MLR)、偏最小二乘回归(PLS)、支持向量机回归(SVM)和基于转录组的距离(D(B))的预测模型的准确性。对于 7 个硬质玉米和 14 个马齿玉米系的因子,评估了杂种的籽粒产量,并使用 56k 微阵列对亲本系的基因表达进行了分析。通过两种交叉验证方案,使用预测和观察到的产量之间的相关性来衡量预测模型的准确性。第一种方案模拟了当测试杂交数据可用于两个亲本系(2 型杂种)时的杂种预测,第二种方案模拟了当没有亲本系的测试杂交数据时的杂种预测(0 型杂种)。对于 2 型杂种,MLR、SVM 和 PLS 导致预测和观察到的产量之间具有高度相关性,而对于 0 型杂种,D(B)具有更高的预测准确性。回归方法对被分析基因集的选择具有鲁棒性,只需要几百个基因。相比之下,对于准确的杂种预测,需要 1000-1500 个基因,并且预测准确性强烈依赖于被分析基因集。我们得出结论,对于在一组遗传物质内的预测,MLR 是一种很有前途的方法,而对于将预测模型从一组遗传物质转移到相关的遗传物质,基于转录组的距离 D(B)是最有前途的。