Gu Xun
Department of Genetics, Developmental and Cell Biology, Iowa State University, Ames, Iowa 50011, and State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai 200433, China.
Genetics. 2014 Aug;197(4):1357-63. doi: 10.1534/genetics.114.164673. Epub 2014 Jun 3.
Although pleiotropy, the capability of a gene to affect multiple phenotypes, has been well known as one of the common gene properties, a quantitative estimation remains a great challenge, simply because of the phenotype complexity. Not surprisingly, it is hard for general readers to understand how, without counting phenotypes, gene pleiotropy can be effectively estimated from the genetics data. In this article we extensively discuss the Gu-2007 method that estimated pleiotropy from the protein sequence analysis. We show that this method is actually to estimate the rank (K) of genotype-phenotype mapping that can be concisely written as K = min(r, Pmin), where Pmin is the minimum pleiotropy among all legitimate measures including the fitness components, and r is the rank of mutational effects of an amino acid site. Together, the effective gene pleiotropy (Ke) estimated by the Gu-2007 method has the following meanings: (i) Ke is an estimate of K = min(r, Pmin), the rank of a genotype-phenotype map; (ii) Ke is an estimate for the minimum pleiotropy Pmin only if Pmin < r; (iii) the Gu-2007 method attempted to estimate the pleiotropy of amino acid sites, a conserved proxy to the true gene pleiotropy; (iv) with a sufficiently large phylogeny such that the rank of mutational effects at an amino acid site is r → 19, one can estimate Pmin between 1 and 19; and (v) Ke is a conserved estimate of K because those slightly affected components in fitness have been effectively removed by the estimation procedure. In addition, we conclude that mutational pleiotropy (number of traits affected by a single mutation) cannot be estimated without knowing the phenotypes.
尽管基因多效性,即一个基因影响多种表型的能力,作为常见的基因特性之一已广为人知,但由于表型的复杂性,对其进行定量估计仍然是一项巨大的挑战。不足为奇的是,普通读者很难理解在不计算表型的情况下,如何从遗传学数据中有效估计基因多效性。在本文中,我们广泛讨论了Gu-2007方法,该方法通过蛋白质序列分析来估计多效性。我们表明,这种方法实际上是估计基因型-表型映射的秩(K),可简洁地写为K = min(r, Pmin),其中Pmin是所有合理测量(包括适合度成分)中的最小多效性,r是氨基酸位点突变效应的秩。总之,Gu-2007方法估计的有效基因多效性(Ke)具有以下含义:(i)Ke是对K = min(r, Pmin),即基因型-表型图谱秩的估计;(ii)仅当Pmin < r时,Ke是对最小多效性Pmin的估计;(iii)Gu-2007方法试图估计氨基酸位点的多效性,这是真实基因多效性的保守替代指标;(iv)在足够大的系统发育情况下,使得氨基酸位点突变效应的秩r → 19时,可以估计1到19之间的Pmin;(v)Ke是K的保守估计,因为适合度中那些受影响较小的成分已在估计过程中被有效去除。此外,我们得出结论,在不知道表型的情况下,无法估计突变多效性(单个突变影响的性状数量)。