Luo Arong, Lan Haiqiang, Ling Cheng, Zhang Aibing, Shi Lei, Ho Simon Y W, Zhu Chaodong
Key Laboratory of Zoological Systematics and Evolution Institute of Zoology Chinese Academy of Sciences Beijing 100101 China.
Key Laboratory of Zoological Systematics and Evolution Institute of Zoology Chinese Academy of Sciences Beijing 100101 China; School of Statistics and Mathematics Yunnan University of Finance and Economics Kunming 650221 China.
Ecol Evol. 2015 Dec 1;5(24):5869-79. doi: 10.1002/ece3.1846. eCollection 2015 Dec.
For some groups of organisms, DNA barcoding can provide a useful tool in taxonomy, evolutionary biology, and biodiversity assessment. However, the efficacy of DNA barcoding depends on the degree of sampling per species, because a large enough sample size is needed to provide a reliable estimate of genetic polymorphism and for delimiting species. We used a simulation approach to examine the effects of sample size on four estimators of genetic polymorphism related to DNA barcoding: mismatch distribution, nucleotide diversity, the number of haplotypes, and maximum pairwise distance. Our results showed that mismatch distributions derived from subsamples of ≥20 individuals usually bore a close resemblance to that of the full dataset. Estimates of nucleotide diversity from subsamples of ≥20 individuals tended to be bell-shaped around that of the full dataset, whereas estimates from smaller subsamples were not. As expected, greater sampling generally led to an increase in the number of haplotypes. We also found that subsamples of ≥20 individuals allowed a good estimate of the maximum pairwise distance of the full dataset, while smaller ones were associated with a high probability of underestimation. Overall, our study confirms the expectation that larger samples are beneficial for the efficacy of DNA barcoding and suggests that a minimum sample size of 20 individuals is needed in practice for each population.
对于某些生物群体而言,DNA条形码技术可为分类学、进化生物学及生物多样性评估提供有用的工具。然而,DNA条形码技术的有效性取决于每个物种的抽样程度,因为需要足够大的样本量才能对遗传多态性进行可靠估计并界定物种。我们采用模拟方法来检验样本量对与DNA条形码技术相关的四种遗传多态性估计量的影响:错配分布、核苷酸多样性、单倍型数量及最大成对距离。我们的结果表明,来自≥20个个体的子样本的错配分布通常与完整数据集的错配分布非常相似。来自≥20个个体的子样本的核苷酸多样性估计值在完整数据集的估计值周围呈钟形,而来自较小子样本的估计值则不然。正如预期的那样,更多的抽样通常会导致单倍型数量增加。我们还发现,≥20个个体的子样本能够很好地估计完整数据集的最大成对距离,而较小的子样本则有很大的低估可能性。总体而言,我们的研究证实了较大样本对DNA条形码技术有效性有益的预期,并表明在实际操作中每个群体至少需要20个个体的样本量。