Wang J C, Hu J, Xu H M, Zhang S
Department of Agronomy, Zhejiang University, 310029 Hangzhou, China.
Theor Appl Genet. 2007 Jun;115(1):1-8. doi: 10.1007/s00122-007-0533-1. Epub 2007 Apr 3.
A strategy was proposed for constructing core collections by least distance stepwise sampling (LDSS) based on genotypic values. In each procedure of cluster, the sampling is performed in the subgroup with the least distance in the dendrogram during constructing a core collection. Mean difference percentage (MD), variance difference percentage (VD), coincidence rate of range (CR) and variable rate of coefficient of variation (VR) were used to evaluate the representativeness of core collections constructed by this strategy. A cotton germplasm collection of 1,547 accessions with 18 quantitative traits was used to construct core collections. Genotypic values of all quantitative traits of the cotton collection were unbiasedly predicted based on mixed linear model approach. By three sampling percentages (10, 20 and 30%), four genetic distances (city block distance, Euclidean distance, standardized Euclidean distance and Mahalanobis distance) combining four hierarchical cluster methods (nearest distance method, furthest distance method, unweighted pair-group average method and Ward's method) were adopted to evaluate the property of this strategy. Simulations were conducted in order to draw consistent, stable and reproducible results. The principal components analysis was performed to validate this strategy. The results showed that core collections constructed by LDSS strategy had a good representativeness of the initial collection. As compared to the control strategy (stepwise clusters with random sampling strategy), LDSS strategy could construct more representative core collections. For LDSS strategy, cluster methods did not need to be considered because all hierarchical cluster methods could give same results completely. The results also suggested that standardized Euclidean distance was an appropriate genetic distance for constructing core collections in this strategy.
提出了一种基于基因型值的最小距离逐步抽样(LDSS)构建核心种质库的策略。在聚类的每个步骤中,构建核心种质库时,在树状图中距离最小的亚组中进行抽样。使用平均差异百分比(MD)、方差差异百分比(VD)、极差符合率(CR)和变异系数变率(VR)来评估该策略构建的核心种质库的代表性。利用一个包含1547份种质、具有18个数量性状的棉花种质库构建核心种质库。基于混合线性模型方法无偏预测棉花种质库所有数量性状的基因型值。通过三种抽样比例(10%、20%和30%),采用四种遗传距离(城市街区距离、欧氏距离、标准化欧氏距离和马氏距离)结合四种层次聚类方法(最短距离法、最长距离法、非加权组平均法和Ward法)来评估该策略的性能。进行模拟以得出一致、稳定和可重复的结果。进行主成分分析以验证该策略。结果表明,由LDSS策略构建的核心种质库对初始种质库具有良好的代表性。与对照策略(随机抽样的逐步聚类策略)相比,LDSS策略能够构建更具代表性的核心种质库。对于LDSS策略,无需考虑聚类方法,因为所有层次聚类方法完全可以给出相同的结果。结果还表明,标准化欧氏距离是该策略构建核心种质库的合适遗传距离。