van Hintum Th J L
Centre for Genetic Resources, The Netherlands (CGN), Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands.
Theor Appl Genet. 2007 Aug;115(3):343-9. doi: 10.1007/s00122-007-0566-5. Epub 2007 May 15.
The results of genetic diversity studies using molecular markers not only depend on the biology of the studied objects but also on the quality of the marker data. Poor data quality may hamper the correct answering of biological questions. A new statistic is proposed to estimate the quality of a marker data set with regard to its ability to describe the structure of the biological material under study. This statistic is called data resolution (DR). It is calculated by splitting a marker data set at random into two sets each with half the number of markers. In each set, similarities between all pairs of objects are calculated. Subsequently, the similarities obtained for the two sets are correlated. This process is repeated a large number of times. The average of the correlation coefficients obtained in this way is the DR of the dataset. In the present paper, the DR statistic is applied to four studies involving amplified fragment length polymorphism as well as micro-satellite markers. In addition, some properties and possible applications of DR are discussed, including the prediction of the added value of scoring additional markers, and the determination of which similarity measure is, apart from genetical considerations, most appropriate for analyzing the data.
使用分子标记进行遗传多样性研究的结果不仅取决于所研究对象的生物学特性,还取决于标记数据的质量。数据质量差可能会妨碍对生物学问题的正确解答。本文提出了一种新的统计量,用于评估标记数据集描述所研究生物材料结构的能力。这个统计量称为数据分辨率(DR)。它是通过将标记数据集随机分成两个各包含一半标记数量的集合来计算的。在每个集合中,计算所有对象对之间的相似度。随后,计算这两个集合所获得的相似度之间的相关性。这个过程重复大量次数。以这种方式获得的相关系数的平均值就是数据集的DR。在本文中,DR统计量应用于四项涉及扩增片段长度多态性以及微卫星标记的研究。此外,还讨论了DR的一些特性和可能的应用,包括预测增加标记评分的附加值,以及确定除遗传因素外,哪种相似度度量最适合分析数据。