Iwata Hiroyoshi, Ebana Kaworu, Uga Yusaku, Hayashi Takeshi
Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, University of Tokyo, Bunkyo, Tokyo, Japan.
Genetic Resources Center, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki, Japan.
PLoS One. 2015 Mar 31;10(3):e0120610. doi: 10.1371/journal.pone.0120610. eCollection 2015.
Shape is an important morphological characteristic both in animals and plants. In the present study, we examined a method for predicting biological contour shapes based on genome-wide marker polymorphisms. The method is expected to contribute to the acceleration of genetic improvement of biological shape via genomic selection. Grain shape variation observed in rice (Oryza sativa L.) germplasms was delineated using elliptic Fourier descriptors (EFDs), and was predicted based on genome-wide single nucleotide polymorphism (SNP) genotypes. We applied four methods including kernel PLS (KPLS) regression for building a prediction model of grain shape, and compared the accuracy of the methods via cross-validation. We analyzed multiple datasets that differed in marker density and sample size. Datasets with larger sample size and higher marker density showed higher accuracy. Among the four methods, KPLS showed the highest accuracy. Although KPLS and ridge regression (RR) had equivalent accuracy in a single dataset, the result suggested the potential of KPLS for the prediction of high-dimensional EFDs. Ordinary PLS, however, was less accurate than RR in all datasets, suggesting that the use of a non-linear kernel was necessary for accurate prediction using the PLS method. Rice grain shape can be predicted accurately based on genome-wide SNP genotypes. The proposed method is expected to be useful for genomic selection in biological shape.
形状是动植物重要的形态特征。在本研究中,我们研究了一种基于全基因组标记多态性预测生物轮廓形状的方法。该方法有望通过基因组选择促进生物形状遗传改良的加速。利用椭圆傅里叶描述符(EFDs)描绘了水稻(Oryza sativa L.)种质中观察到的谷粒形状变异,并基于全基因组单核苷酸多态性(SNP)基因型进行了预测。我们应用了包括核偏最小二乘(KPLS)回归在内的四种方法来构建谷粒形状预测模型,并通过交叉验证比较了这些方法的准确性。我们分析了标记密度和样本量不同的多个数据集。样本量较大且标记密度较高的数据集显示出更高的准确性。在这四种方法中,KPLS显示出最高的准确性。虽然在单个数据集中KPLS和岭回归(RR)具有相同的准确性,但结果表明KPLS在预测高维EFDs方面具有潜力。然而,普通偏最小二乘法(Ordinary PLS)在所有数据集中的准确性都低于RR,这表明使用非线性核对于使用偏最小二乘法进行准确预测是必要的。基于全基因组SNP基因型可以准确预测水稻谷粒形状。所提出的方法有望用于生物形状的基因组选择。