Kostetskiĭ P V, Vladimirova R R
Mol Biol (Mosk). 1992 Jul-Aug;26(4):859-68.
A set of aligned homologous protein sequences is divided into two groups consisting of the most related sequences m and k. The value of the position variability of homologous protein sequences is defined as a number of failures to coincide in the intergroup comparison of all possible k x m pairs of amino acid residues in that position divided by k x m. The position variability value plotted vs the sequence position number with a window of 10 positions gives the intergroup local variability profile. The area S of the figure included between the local variability profile and the straight line corresponding to the mean local variability value is compared with the average area S(r) for 1000 random homologous protein families. If S is greater than S(r) by more than 2 standard deviation units sigma r the local variability profile is assumed to contain peaks and hollows corresponding to significant variable and conservative regions of the sequences. The profile extrema containing the area surplus delta S = S-(S(r) + 2 sigma r) are cut off by two straight lines to locate significant regions. The numerical experiment on the family of homologous phospholipases A2 revealed the linear dependence of the values S(r) and sigma r upon the position variability standard deviation sigma v of the homologous sequences. Furthermore, it was shown for protein families of various length (rhodopsins, aspartate aminotransferases, cytochromes b, L- and M-subunits of photosynthetic bacteria photoreaction centre and alpha-subunits of Na, K-ATPase), that delta S = S - n(S'r + 2 sigma r), where S - the area of the local variability profile, n = L/l (L - the length of the given protein family and l - the length of the hypothetical protein domain). If l = 250 then S'r = -1.42 + 62.56 sigma v and sigma'r = -0.14 + 7.46 sigma v.