Department of Plant Breeding, Universidad Autónoma Agraria Antonio Narro, Saltillo, Coahuila, Mexico.
PLoS One. 2013 Nov 19;8(11):e79936. doi: 10.1371/journal.pone.0079936. eCollection 2013.
The strategy of bulk DNA sampling has been a valuable method for studying large numbers of individuals through genetic markers. The application of this strategy for discrimination among germplasm sources was analyzed through information theory, considering the case of polymorphic alleles scored binarily for their presence or absence in DNA pools. We defined the informativeness of a set of marker loci in bulks as the mutual information between genotype and population identity, composed by two terms: diversity and noise. The first term is the entropy of bulk genotypes, whereas the noise term is measured through the conditional entropy of bulk genotypes given germplasm sources. Thus, optimizing marker information implies increasing diversity and reducing noise. Simple formulas were devised to estimate marker information per allele from a set of estimated allele frequencies across populations. As an example, they allowed optimization of bulk size for SSR genotyping in maize, from allele frequencies estimated in a sample of 56 maize populations. It was found that a sample of 30 plants from a random mating population is adequate for maize germplasm SSR characterization. We analyzed the use of divided bulks to overcome the allele dilution problem in DNA pools, and concluded that samples of 30 plants divided into three bulks of 10 plants are efficient to characterize maize germplasm sources through SSR with a good control of the dilution problem. We estimated the informativeness of 30 SSR loci from the estimated allele frequencies in maize populations, and found a wide variation of marker informativeness, which positively correlated with the number of alleles per locus.
大量 DNA 采样策略一直是通过遗传标记研究大量个体的一种有效方法。通过信息理论分析了该策略在种质资源鉴别中的应用,考虑了在 DNA 池中标记等位基因以二进制方式记录其存在或缺失的情况。我们将一组标记位点在批量中的信息量定义为基因型与群体身份之间的互信息,由两个术语组成:多样性和噪声。第一个术语是批量基因型的熵,而噪声术语是通过批量基因型给定种质资源的条件熵来衡量的。因此,优化标记信息意味着增加多样性和减少噪声。设计了简单的公式来从跨群体估计的等位基因频率中估计每个等位基因的标记信息量。例如,它们允许优化玉米 SSR 基因分型的批量大小,从 56 个玉米群体样本中估计的等位基因频率。发现随机交配群体中的 30 株植物样本足以用于玉米种质 SSR 特征描述。我们分析了使用分割批量来克服 DNA 池中的等位基因稀释问题,并得出结论,将 30 株植物样本分为 3 个 10 株的批量,可以有效地通过 SSR 对玉米种质资源进行特征描述,并很好地控制稀释问题。我们从玉米群体中估计的等位基因频率估计了 30 个 SSR 位点的信息量,并发现标记信息量的广泛变化,这与每个位点的等位基因数呈正相关。