Department of Bioinformatics, Genentech Inc., 1 DNA Way, South San Francisco, CA 94080, USA.
Genome Biol. 2008 Apr 8;9(4):R69. doi: 10.1186/gb-2008-9-4-r69.
The rates of molecular evolution for protein-coding genes depend on the stringency of functional or structural constraints. The Ka/Ks ratio has been commonly used as an indicator of selective constraints and is typically calculated from interspecies alignments. Recent accumulation of single nucleotide polymorphism (SNP) data has enabled the derivation of Ka/Ks ratios for polymorphism (SNP A/S ratios).
Using data from the dbSNP database, we conducted the first large-scale survey of SNP A/S ratios for different structural and functional properties. We confirmed that the SNP A/S ratio is largely correlated with Ka/Ks for divergence. We observed stronger selective constraints for proteins that have high mRNA expression levels or broad expression patterns, have no paralogs, arose earlier in evolution, have natively disordered regions, are located in cytoplasm and nucleus, or are related to human diseases. On the residue level, we found higher degrees of variation for residues that are exposed to solvent, are in a loop conformation, natively disordered regions or low complexity regions, or are in the signal peptides of secreted proteins. Our analysis also revealed that histones and protein kinases are among the protein families that are under the strongest selective constraints, whereas olfactory and taste receptors are among the most variable groups.
Our study suggests that the SNP A/S ratio is a robust measure for selective constraints. The correlations between SNP A/S ratios and other variables provide valuable insights into the natural selection of various structural or functional properties, particularly for human-specific genes and constraints within the human lineage.
蛋白质编码基因的分子进化率取决于功能或结构约束的严格程度。Ka/Ks 比值通常被用作选择约束的指标,并且通常是从种间比对中计算得出的。最近单核苷酸多态性 (SNP) 数据的积累使得能够为多态性 (SNP A/S 比值) 推导出 Ka/Ks 比值。
利用来自 dbSNP 数据库的数据,我们对不同结构和功能特性的 SNP A/S 比值进行了首次大规模调查。我们证实 SNP A/S 比值与分歧时的 Ka/Ks 高度相关。我们观察到具有高 mRNA 表达水平或广泛表达模式、没有旁系同源物、在进化早期出现、具有天然无序区域、位于细胞质和细胞核中或与人类疾病相关的蛋白质受到更强的选择约束。在残基水平上,我们发现暴露于溶剂、处于环构象、天然无序区域或低复杂度区域或分泌蛋白信号肽中的残基具有更高的变异性。我们的分析还表明,组蛋白和蛋白激酶是受到最强选择约束的蛋白质家族之一,而嗅觉和味觉受体是变异最大的群体之一。
我们的研究表明,SNP A/S 比值是一种稳健的选择约束衡量标准。SNP A/S 比值与其他变量之间的相关性为各种结构或功能特性的自然选择提供了有价值的见解,特别是对于人类特异性基因和人类谱系内的约束。