Michel C J
J Theor Biol. 1986 May 21;120(2):223-36. doi: 10.1016/s0022-5193(86)80176-x.
We propose a new approach to study protein coding and non-coding regions in DNA sequences, by making use of two complementary statistical methods. The principal component analysis (PCA) is a graphical method to represent DNA sequences which are characterized by some quantitative parameters: it is a help to the intuition. The discriminating analysis (DA) is a quantitative method which permits to classify the DNA sequences. It leads to an evaluation of the first method and to a decision. The value of this approach has been confirmed since we also have found some results which had been described recently in the literature. Furthermore, this general methodology has permitted us to show the existence of parameters which identify the nucleic acid sequence functional domains, without having to make use of the properties of the genetic code.
我们提出了一种利用两种互补统计方法来研究DNA序列中蛋白质编码区和非编码区的新方法。主成分分析(PCA)是一种用一些定量参数来表征DNA序列的图形方法:它有助于直观理解。判别分析(DA)是一种定量方法,可对DNA序列进行分类。它能对第一种方法进行评估并做出决策。由于我们也发现了一些最近文献中描述的结果,这种方法的价值得到了证实。此外,这种通用方法使我们能够证明存在一些可识别核酸序列功能域的参数,而无需利用遗传密码的特性。