Torney D C, Whittaker C C, Xie G
Theoretical Division and U.S.D.O.E. Joint Genome Institute, Mail Stop K710, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA.
J Mol Biol. 1999 Mar 12;286(5):1461-9. doi: 10.1006/jmbi.1998.2567.
We introduce a generally applicable method for the discovery and quantitation of all of the characteristic statistical properties of a class of biological sequences, given examples from the class. This method employs a reversible binary encoding of sequences into the binary digits -1 and +1. Then, provided that the sample is sufficient, the sample cumulants on the subsets of digit positions will manifest all of the statistical properties of the class. As an illustration, we present the main results of a complete characterization of the stationary statistical properties of human coding sequences, in terms of their sample cumulants. Many of the telling sample cumulants are described.