Casino A, Cipollaro M, Guerrini A M, Mastrocinque G, Spena A, Scarlato V
Nucleic Acids Res. 1981 Mar 25;9(6):1499-518. doi: 10.1093/nar/9.6.1499.
A Fortran computer algorithm has been used to analyze the nucleotide sequence of several structural genes. The analysis performed on both coding and complementary DNA strands shows that whereas open reading frames shorter than 100 codons are randomly distributed on both DNA strands, open reading frames longer than 100 codons ("virtual genes") are significantly more frequent on the complementary DNA strand than on the coding one. These "virtual genes" were further investigated by looking at intron sequences, splicing points, signal sequences and by analyzing gene mutations. On the basis of this analysis coding and complementary DNA strands of several eukaryotic structural genes cannot be distinguished. In particular we suggest that the complementary DNA strand of the human epsilon-globin gene might indeed code for a protein.
一种Fortran计算机算法已被用于分析几个结构基因的核苷酸序列。对编码DNA链和互补DNA链进行的分析表明,虽然短于100个密码子的开放阅读框在两条DNA链上随机分布,但长于100个密码子的开放阅读框(“虚拟基因”)在互补DNA链上比在编码链上显著更频繁。通过查看内含子序列、剪接位点、信号序列并分析基因突变,对这些“虚拟基因”进行了进一步研究。基于此分析,几个真核结构基因的编码DNA链和互补DNA链无法区分。特别是,我们认为人类ε-珠蛋白基因的互补DNA链可能确实编码一种蛋白质。