Frenkel' F E, Korotkov E V
Mol Biol (Mosk). 2008 Jul-Aug;42(4):707-20.
We conducted classification for 472,288 regions of triplet periodicity found in 578,868 genes from release 29 of KEGG databank. A new concept of triplet periodicity class and a measure of similarity between them are introduced. Totally 2520 classes were created that contain 94% of found triplet periodicity. For 92% of triplet periodicity regions contained in classes an identical linkage of triplet periodicity to reading frame is observed. For the rest triplet periodicity cases a shift between reading frame of a gene and reading frame common for majority of genes contained in a class of triplet periodicity was observed. These periodicity regions were encoded into hypothetical amino acid sequences in accordance with reading frame built by triplet periodicity class. By BLAST program it was shown that 2660 hypothetical amino acid sequences have statistically significant similarity with proteins from UniProt databank. We suppose that 8% of triplet periodicity regions that joined classes mutated by means of reading frame shift. Created classes of triplet periodicity can be used for identification of coding regions of genes as well as for searching for mutations arisen from reading frame shift.
我们对从KEGG数据库第29版的578,868个基因中发现的472,288个三联体周期性区域进行了分类。引入了三联体周期性类别的新概念以及它们之间的相似性度量。总共创建了2520个类别,其中包含94%的已发现三联体周期性。在这些类别中包含的92%的三联体周期性区域中,观察到三联体周期性与阅读框的相同连锁关系。对于其余的三联体周期性情况,观察到基因的阅读框与三联体周期性类别中大多数基因共有的阅读框之间存在偏移。这些周期性区域根据由三联体周期性类别构建的阅读框被编码为假设的氨基酸序列。通过BLAST程序表明,2660个假设的氨基酸序列与来自UniProt数据库的蛋白质具有统计学上显著的相似性。我们推测,8%的加入类别的三联体周期性区域通过阅读框移位发生了突变。创建的三联体周期性类别可用于识别基因的编码区域以及搜索由阅读框移位产生的突变。