ETH Zürich, Zürich, Switzerland.
Genome Biol Evol. 2009 Jun 5;1:114-8. doi: 10.1093/gbe/evp012.
Published estimates of the proportion of positively selected genes (PSGs) in human vary over three orders of magnitude. In mammals, estimates of the proportion of PSGs cover an even wider range of values. We used 2,980 orthologous protein-coding genes from human, chimpanzee, macaque, dog, cow, rat, and mouse as well as an established phylogenetic topology to infer the fraction of PSGs in all seven terminal branches. The inferred fraction of PSGs ranged from 0.9% in human through 17.5% in macaque to 23.3% in dog. We found three factors that influence the fraction of genes that exhibit telltale signs of positive selection: the quality of the sequence, the degree of misannotation, and ambiguities in the multiple sequence alignment. The inferred fraction of PSGs in sequences that are deficient in all three criteria of coverage, annotation, and alignment is 7.2 times higher than that in genes with high trace sequencing coverage, "known" annotation status, and perfect alignment scores. We conclude that some estimates on the prevalence of positive Darwinian selection in the literature may be inflated and should be treated with caution.
已发表的人类正选择基因(PSG)比例的估计值在三个数量级上变化。在哺乳动物中,PSG 比例的估计值涵盖了更广泛的数值范围。我们使用了来自人类、黑猩猩、猕猴、狗、牛、大鼠和小鼠的 2980 个直系同源蛋白编码基因以及已建立的系统发育拓扑结构,来推断所有七个末端分支中的 PSG 分数。推断的 PSG 分数范围从人类的 0.9%到猕猴的 17.5%到狗的 23.3%。我们发现了三个影响表现出正选择明显迹象的基因分数的因素:序列质量、错误注释程度和多重序列比对中的歧义。在覆盖范围、注释和比对这三个标准都不足的序列中推断的 PSG 分数比具有高痕量测序覆盖度、“已知”注释状态和完美比对分数的基因高 7.2 倍。我们得出结论,文献中关于正达尔文选择普遍性的一些估计可能被夸大了,应该谨慎对待。