College of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang, Hebei, People's Republic of China.
Comput Biol Med. 2012 May;42(5):556-63. doi: 10.1016/j.compbiomed.2012.01.011. Epub 2012 Feb 10.
Based on Huffman tree method, we propose a new 2D graphic representation of protein sequence. This representation can completely avoid loss of information in the transfer of data from a protein sequence to its graphic representation. The method consists of two parts. One is about the 0-1 codes of 20 amino acids by Huffman tree with amino acid frequency. The amino acid frequency is defined as the statistical number of an amino acid in the analyzed protein sequences. The other is about the 2D graphic representation of protein sequence based on the 0-1 codes. Then the applications of the method on ten ND5 genes and seven Escherichia coli strains are presented in detail. The results show that the proposed model may provide us with some new sights to understand the evolution patterns determined from protein sequences and complete genomes.
基于哈夫曼树方法,我们提出了一种新的蛋白质序列二维图形表示方法。这种表示方法可以完全避免在将蛋白质序列转换为图形表示时信息的丢失。该方法包括两部分。一部分是根据氨基酸频率的哈夫曼树对 20 种氨基酸进行 0-1 编码。氨基酸频率定义为分析的蛋白质序列中某一氨基酸的统计数。另一部分是基于 0-1 编码的蛋白质序列的二维图形表示。然后详细介绍了该方法在 10 个 ND5 基因和 7 个大肠杆菌菌株上的应用。结果表明,所提出的模型可能为我们提供一些新的视角来理解由蛋白质序列和完整基因组决定的进化模式。