Golding Brian
Department of Biology, McMaster University, 1280 Main Street West, Hamilton, Ontario, L8S 4K1, Canada.
Evolution. 1993 Oct;47(5):1420-1431. doi: 10.1111/j.1558-5646.1993.tb02164.x.
Selection can have a significant effect on sequence evolution and this will be reflected in the information contained within the phylogenetic relationships between species. Selection will reduce the frequency of any deleterious nucleotides, and this can be used to test for the presence of selection. The frequencies of different nucleotides can be predicted theoretically and compared to observed values. If a sample of sequences has an usually low frequency of a particular nucleotide then selection might be inferred to have acted upon these sequences. This conclusion can be true only if the sequences are not too closely related and if sufficient mutations have occurred during their evolution. Otherwise, the unusual pattern of nucleotides in the sequences may be caused by recent common ancestry. An algorithm is presented to obtain maximum-likelihood estimates of selection coefficients using the phylogenetic information contained within sequence data. A k-allele model is developed that uses the phylogeny to measure relative mutation rates and degrees of relatedness and to evaluate the likelihood in the presence of selection. The method is illustrated with examples from the NS2 genes of influenza viruses and the MHC genes of mice. It is shown that the maximum-likelihood estimate for mutation rates are very large for. influenza viruses and that statistically significant selection acts to maintain a specific coding sequence. Overall, the MHC genes also have significant selection to preserve the coding sequence, but at the antigen recognition site, this selection is reversed to promote genetic variation. Maximum-likelihood estimates of these selection coefficients are provided.
选择会对序列进化产生重大影响,这将反映在物种间系统发育关系所包含的信息中。选择会降低任何有害核苷酸的频率,这可用于检测选择的存在。不同核苷酸的频率可以从理论上进行预测,并与观察值进行比较。如果一个序列样本中某一特定核苷酸的频率异常低,那么可能推断选择作用于这些序列。只有当序列之间的关系不太密切且在其进化过程中发生了足够多的突变时,这个结论才成立。否则,序列中核苷酸的异常模式可能是由最近的共同祖先造成的。本文提出了一种算法,利用序列数据中包含的系统发育信息来获得选择系数的最大似然估计。开发了一种k等位基因模型,该模型利用系统发育来测量相对突变率和相关性程度,并在存在选择的情况下评估似然性。通过流感病毒NS2基因和小鼠MHC基因的例子对该方法进行了说明。结果表明,流感病毒的突变率最大似然估计值非常大,并且有统计学意义的选择作用于维持特定的编码序列。总体而言,MHC基因也有显著的选择作用来保留编码序列,但在抗原识别位点,这种选择作用相反,以促进遗传变异。提供了这些选择系数的最大似然估计值。