Suppr超能文献

评估 SNPs 在群体结构和保护研究中的统计功效。

Assessing statistical power of SNPs for population structure and conservation studies.

机构信息

Southwest Fisheries Science Center, 8604 La Jolla Shores Drive, La Jolla, CA 92037, USA.

出版信息

Mol Ecol Resour. 2009 Jan;9(1):66-73. doi: 10.1111/j.1755-0998.2008.02392.x. Epub 2008 Oct 21.

Abstract

Single nucleotide polymorphisms (SNPs) have been proposed by some as the new frontier for population studies, and several papers have presented theoretical and empirical evidence reporting the advantages and limitations of SNPs. As a practical matter, however, it remains unclear how many SNP markers will be required or what the optimal characteristics of those markers should be in order to obtain sufficient statistical power to detect different levels of population differentiation. We use a hypothetical case to illustrate the process of designing a population genetics project, and present results from simulations that address several issues for maximizing statistical power to detect differentiation while minimizing the amount of effort in developing SNPs. Results indicate that (i) while ~30 SNPs should be sufficient to detect moderate (F(ST)  = 0.01) levels of differentiation, studies aimed at detecting demographic independence (e.g. F(ST)  < 0.005) may require 80 or more SNPs and large sample sizes; (ii) different SNP allele frequencies have little affect on power, and thus, selection of SNPs can be relatively unbiased; (iii) increasing the sample size has a strong effect on power, so that the number of loci can be minimized when sample number is known, and increasing sample size is almost always beneficial; and (iv) power is increased by including multiple SNPs within loci and inferring haplotypes, rather than trying to use only unlinked SNPs. This also has the practical benefit of reducing the SNP ascertainment effort, and may influence the decision of whether to seek SNPs in coding or noncoding regions.

摘要

单核苷酸多态性(SNPs)被一些人认为是群体研究的新前沿,有几篇论文提出了理论和经验证据,报告了 SNPs 的优势和局限性。然而,实际上,尚不清楚需要多少 SNP 标记,或者这些标记的最佳特征应该是什么,才能获得足够的统计能力来检测不同水平的群体分化。我们使用一个假设案例来说明设计群体遗传学项目的过程,并提出了模拟结果,这些结果解决了几个最大化检测分化的统计能力同时最小化开发 SNP 工作量的问题。结果表明:(i) 虽然大约 30 个 SNP 应该足以检测到中等程度的分化(F(ST) = 0.01),但旨在检测人口独立性(例如 F(ST) < 0.005)的研究可能需要 80 个或更多的 SNP 和大样本量;(ii) SNP 等位基因频率对功率的影响很小,因此,SNP 的选择可以相对无偏;(iii) 增加样本量对功率有很强的影响,因此,当样本数量已知时,可以最小化基因座的数量,并增加样本量通常是有益的;(iv) 通过在基因座内包含多个 SNP 并推断单倍型来增加功率,而不是试图仅使用非连锁 SNP。这也具有减少 SNP 确定工作量的实际好处,并且可能影响是否在编码或非编码区域中寻找 SNP 的决定。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验