Bioinfomatics Laboratory, Centre of Bioengineering, Russian Academy of Sciences, 117312, Moscow, Prospect 60-tya Oktyabrya, 7/1, Russia.
Gene. 2012 Jan 1;491(1):58-64. doi: 10.1016/j.gene.2011.08.032. Epub 2011 Sep 29.
The triplet periodicity (TP) is a distinguished property of protein coding sequences. There are complex genes with more than one TP type along their sequence. We say that these genes contain a triplet periodicity change point. The aim of the work is to find all genes that contain TP change point and attempt to compare the positions of change point in genes with known biological data. We have developed a mathematical method to identify triplet periodicity changes along a sequence. We have found 311,221 genes with the TP change point in the KEGG/Genes database (version 48). It is about 8% from the total database volume (4013150). We showed that the repetitive sequences are not the only cause of such events. We suppose that the TP change point may indicate a fusion of genes or domains. We performed BLAST analysis to find potential ancestral genes for the parts of genes with TP change point. As a result we found that in 131323 cases sequences with TP change point have proper similarities for one or both parts. The relationship between TP change point and the fusion events in genes is discussed. The program realization of the method is available by request to authors.
三联体周期性(TP)是蛋白质编码序列的一个显著特征。在它们的序列中,有许多具有不止一种 TP 类型的复杂基因。我们称这些基因为具有三联体周期性变化点的基因。这项工作的目的是找到所有包含 TP 变化点的基因,并尝试将变化点的位置与已知的生物学数据进行比较。我们已经开发了一种数学方法来识别序列中的三联体周期性变化。我们在 KEGG/Genes 数据库(版本 48)中发现了 311221 个具有 TP 变化点的基因。这大约占数据库总量(4013150)的 8%。我们表明,重复序列不是此类事件的唯一原因。我们假设 TP 变化点可能表明基因或结构域的融合。我们进行了 BLAST 分析,以找到具有 TP 变化点的基因部分的潜在祖先基因。结果发现,在 131323 种情况下,具有 TP 变化点的序列在一个或两个部分都具有适当的相似性。讨论了 TP 变化点与基因中融合事件之间的关系。该方法的程序实现可向作者请求。