Frenkel F E, Korotkov E V
Bioengineering Centre of RAS, 60-letiya Oktyabrya prosp., 7/1, Moscow, Russia.
DNA Res. 2009 Apr;16(2):105-14. doi: 10.1093/dnares/dsp002. Epub 2009 Mar 3.
We introduce a novel approach for the detection of possible mutations leading to a reading frame (RF) shift in a gene. Deletions and insertions of DNA coding regions are considerable events for genes because an RF shift results in modifications of the extensive region of amino acid sequence coded by a gene. The suggested method is based on the phenomenon of triplet periodicity (TP) in coding regions of genes and its relative resistance to substitutions in DNA sequence. We attempted to extend 326 933 regions of continuous TP found in genes from the KEGG databank by considering possible insertions and deletions. We revealed totally 824 genes where such extension was possible and statistically significant. Then we generated amino acid sequences according to active (KEGG's) and hypothetically ancient RFs in order to find confirmation of a shift at a protein level. Consequently, 64 sequences have protein similarities only for ancient RF, 176 only for active RF, 3 for both and 581 have no protein similarity at all. We aimed to have revealed lower bound for the number of genes in which a shift between RF and TP is possible. Further ways to increase the number of revealed RF shifts are discussed.
我们介绍了一种用于检测可能导致基因阅读框(RF)移位的突变的新方法。DNA编码区的缺失和插入对于基因来说是相当重要的事件,因为阅读框移位会导致基因编码的氨基酸序列的大片段区域发生改变。所提出的方法基于基因编码区的三联体周期性(TP)现象及其对DNA序列中替换的相对抗性。我们试图通过考虑可能的插入和缺失来扩展从KEGG数据库中基因发现的326933个连续三联体周期性区域。我们总共发现了824个基因,在这些基因中这种扩展是可能的且具有统计学意义。然后,我们根据活性(KEGG的)和假设的古代阅读框生成氨基酸序列,以便在蛋白质水平上找到阅读框移位的证据。结果,64个序列仅与古代阅读框具有蛋白质相似性,176个仅与活性阅读框具有蛋白质相似性,3个与两者都有相似性,581个完全没有蛋白质相似性。我们旨在揭示阅读框和三联体周期性之间可能发生移位的基因数量的下限。还讨论了增加所揭示的阅读框移位数量的进一步方法。