Mouse Biology Unit, European Molecular Biology Laboratory (EMBL), via Ramarini 32, 00015 Monterotondo, Italy.
Evol Bioinform Online. 2012;8:611-22. doi: 10.4137/EBO.S10194. Epub 2012 Nov 14.
We propose a novel and simple approach to elucidate genomic patterns of divergence using principal component analysis (PCA). We applied this methodology to the metric space generated by M. musculus genome-wide SNPs. Distance profiles were computed between M. musculus and its closely related species, M. spretus, which was used as external reference. While the speciation dynamics were apparent in the first principal component, the within M. musculus differentiation dimensions gave rise to three minor components. We were unable to obtain a clear divergence signature discriminating laboratory strains, suggesting a stronger effect of genetic drift. These results were at odds with wild strains which exhibit defined deterministic signals of divergence. Finally, we were able to rank novel and previously known genes according to their likelihood to be under selective pressure. In conclusion, we posit PCA as a robust methodology to unravel diverging DNA regions without any a priori forcing.
我们提出了一种新颖而简单的方法,通过主成分分析(PCA)来阐明基因组的分歧模式。我们将这种方法应用于由 M. musculus 全基因组 SNPs 生成的度量空间。计算了 M. musculus 与其亲缘关系密切的物种 M. spretus 之间的距离分布,后者被用作外部参考。虽然在第一主成分中明显存在物种形成动态,但在 M. musculus 内部的分化维度中产生了三个较小的成分。我们无法获得清晰的区分实验室菌株的分歧特征,这表明遗传漂变的影响更强。这些结果与表现出明确的分歧确定性信号的野生菌株不一致。最后,我们能够根据其受到选择压力的可能性对新的和先前已知的基因进行排序。总之,我们认为 PCA 是一种强大的方法,可以在没有任何先验强制的情况下揭示分歧的 DNA 区域。