Yang Ziheng, Swanson Willie J
Galton Laboratory, Department of Biology, University College London, 4 Stephenson Way, London NW1 2HE, UK.
Mol Biol Evol. 2002 Jan;19(1):49-57. doi: 10.1093/oxfordjournals.molbev.a003981.
The nonsynonymous to synonymous substitution rate ratio (omega = d(N)/d(S)) provides a sensitive measure of selective pressure at the protein level, with omega values <1, =1, and >1 indicating purifying selection, neutral evolution, and diversifying selection, respectively. Maximum likelihood models of codon substitution developed recently account for variable selective pressures among amino acid sites by employing a statistical distribution for the omega ratio among sites. Those models, called random-sites models, are suitable when we do not know a priori which sites are under what kind of selective pressure. Sometimes prior information (such as the tertiary structure of the protein) might be available to partition sites in the protein into different classes, which are expected to be under different selective pressures. It is then sensible to use such information in the model. In this paper, we implement maximum likelihood models for prepartitioned data sets, which account for the heterogeneity among site partitions by using different omega parameters for the partitions. The models, referred to as fixed-sites models, are also useful for combined analysis of multiple genes from the same set of species. We apply the models to data sets of the major histocompatibility complex (MHC) class I alleles from human populations and of the abalone sperm lysin genes. Structural information is used to partition sites in MHC into two classes: those in the antigen recognition site (ARS) and those outside. Positive selection is detected in the ARS by the fixed-sites models. Similarly, sites in lysin are classified into the buried and solvent-exposed classes according to the tertiary structure, and positive selection was detected at the solvent-exposed sites. The random-sites models identified a number of sites under positive selection in each data set, confirming and elaborating the results of the fixed-sites models. The analysis demonstrates the utility of the fixed-sites models, as well as the power of previous random-sites models, which do not use the prior information to partition sites.
非同义替换率与同义替换率的比值(ω = d(N)/d(S))提供了一种在蛋白质水平上衡量选择压力的灵敏指标,ω值<1、=1和>1分别表示纯化选择、中性进化和多样化选择。最近开发的密码子替换最大似然模型通过采用位点间ω比值的统计分布来考虑氨基酸位点间可变的选择压力。这些模型,称为随机位点模型,适用于我们事先不知道哪些位点处于何种选择压力的情况。有时可能会有先验信息(如蛋白质的三级结构)可用于将蛋白质中的位点划分为不同类别,预计这些类别处于不同的选择压力之下。那么在模型中使用此类信息是合理的。在本文中,我们为预先划分的数据集实现了最大似然模型,该模型通过为各划分使用不同的ω参数来考虑位点划分间的异质性。这些模型,称为固定位点模型,对于来自同一组物种的多个基因的联合分析也很有用。我们将这些模型应用于人类群体中主要组织相容性复合体(MHC)I类等位基因以及鲍鱼精子溶素基因的数据集。利用结构信息将MHC中的位点划分为两类:抗原识别位点(ARS)中的位点和外部的位点。通过固定位点模型在ARS中检测到正选择。类似地,根据三级结构将溶素中的位点分为埋藏位点和溶剂暴露位点两类,并在溶剂暴露位点检测到正选择。随机位点模型在每个数据集中识别出许多处于正选择下的位点,证实并细化了固定位点模型的结果。该分析证明了固定位点模型的实用性,以及先前不使用先验信息进行位点划分的随机位点模型的效力。