Bjørnstad Asmund, Westad Frank, Martens Harald
Department of Plant and Environmental Sciences, Agricultural University of Norway, As, Norway.
Hereditas. 2004;141(2):149-65. doi: 10.1111/j.1601-5223.2004.01816.x.
The utility of a relatively new multivariate method, bi-linear modelling by cross-validated partial least squares regression (PLSR), was investigated in the analysis of QTL. The distinguishing feature of PLSR is to reveal reliable covariance structures in data of different types with regard to the same set objects. Two matrices X (here: genetic markers) and Y (here: phenotypes) are interactively decomposed into latent variables (PLS components, or PCs) in a way which facilitates statistically reliable and graphically interpretable model building. Natural collinearities between input variables are utilized actively to stabilise the modelling, instead of being treated as a statistical problem. The importance of cross-validation/jack-knifing as an intuitively appealing way to avoid overfitting, is emphasized. Two datasets from chromosomal mapping studies of different complexity were chosen for illustration (QTL for tomato yield and for oat heading date). Results from PLSR analysis were compared to published results and to results using the package PLABQTL in these data sets. In all cases PLSR gave at least similar explained validation variances as the reported studies. An attractive feature is that PLSR allows the analysis of several traits/replicates in one analysis, and the direct visual identification of individuals with desirable marker genotypes. It is suggested that PLSR may be useful in structural and functional genomics and in marker assisted selection, particularly in cases with limited number of objects.
在数量性状基因座(QTL)分析中,研究了一种相对较新的多变量方法——通过交叉验证偏最小二乘回归(PLSR)进行双线性建模的效用。PLSR的显著特点是,对于同一组对象,能在不同类型的数据中揭示可靠的协方差结构。两个矩阵X(此处:遗传标记)和Y(此处:表型)以一种有助于构建统计可靠且可图形解释模型的方式交互分解为潜在变量(PLS成分或主成分)。积极利用输入变量之间的自然共线性来稳定建模,而非将其视为一个统计问题。强调了交叉验证/留一法作为一种直观且有吸引力的避免过拟合方法的重要性。选择了两个来自不同复杂程度染色体定位研究的数据集用于说明(番茄产量和燕麦抽穗期的QTL)。将PLSR分析的结果与已发表的结果以及在这些数据集中使用PLABQTL软件包得到的结果进行了比较。在所有情况下,PLSR给出的验证方差解释至少与已报道的研究相似。一个吸引人的特点是,PLSR允许在一次分析中分析多个性状/重复,并直接直观地识别具有理想标记基因型的个体。建议PLSR在结构和功能基因组学以及标记辅助选择中可能有用,特别是在对象数量有限的情况下。