Morton Brian R, Dar Vaqaar-un-Nisa, Wright Stephen I
Department of Biological Science, Barnard College, Columbia University, New York, New York 10027, USA.
Plant Physiol. 2009 Feb;149(2):616-24. doi: 10.1104/pp.108.127787. Epub 2008 Nov 19.
Previous studies have shown that the pattern of single nucleotide polymorphism (SNP) in Arabidopsis (Arabidopsis thaliana) deviates from the distribution expected under a neutral model. Here, we test whether or not ancestral misinference could explain this deviation. We start by showing that there are significant and complex influences of context on mutation dynamics as inferred from SNP frequency, in Arabidopsis, and compare the results to observations about context dependency that have been made on a previous analysis of a maize (Zea mays) SNP dataset. The data concerning heterogeneity across sites are then used to make corrections for ancestral misinference in a context-dependent manner. Using Arabidopsis lyrata to infer the ancestral state for SNPs, we show that the resulting unfolded site frequency spectrum (SFS) in Arabidopsis is skewed toward sites with high frequency derived nucleotides. Sites are also partitioned into two general functional classes, second codon position and 4-fold degenerate sites. These two classes show different SFS; although both show an overrepresentation of high frequency derived sites, low frequency derived sites are vastly overrepresented at the second codon position, but significantly underrepresented at 4-fold degenerate sites. We find that these results are robust to corrections for ancestral misinference, even when context-dependent variation in mutation properties is taken into consideration. The data suggest that, in addition to purifying selection, complex demographic events and/or linked positive selection need to be invoked to explain the SFS, and they highlight the importance of sequence context in analyses of genome-wide variation.
先前的研究表明,拟南芥(Arabidopsis thaliana)中的单核苷酸多态性(SNP)模式偏离了中性模型下预期的分布。在此,我们测试祖先推断错误是否能够解释这种偏差。我们首先表明,从拟南芥SNP频率推断,上下文对突变动态存在显著且复杂的影响,并将结果与先前对玉米(Zea mays)SNP数据集分析中关于上下文依赖性的观察结果进行比较。然后,利用有关位点间异质性的数据,以依赖上下文的方式对祖先推断错误进行校正。利用琴叶拟南芥(Arabidopsis lyrata)推断SNP的祖先状态,我们发现拟南芥中得到的展开位点频率谱(SFS)偏向于具有高频衍生核苷酸的位点。位点也被分为两个一般功能类别,即第二密码子位置和4倍简并位点。这两个类别显示出不同的SFS;尽管两者都显示高频衍生位点的过度代表,但低频衍生位点在第二密码子位置大幅过度代表,而在4倍简并位点显著不足代表。我们发现,即使考虑到突变特性的上下文依赖性变化,这些结果对于祖先推断错误的校正也是稳健的。数据表明,除了纯化选择外,还需要引入复杂的群体事件和/或连锁正选择来解释SFS,并且它们突出了序列上下文在全基因组变异分析中的重要性。