Castellano S, Morozova N, Morey M, Berry M J, Serras F, Corominas M, Guigó R
Grup de Recerca en Informàtica Biomèdica, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, Dr. Aiguader 80, 08003 Barcelona, Spain.
EMBO Rep. 2001 Aug;2(8):697-702. doi: 10.1093/embo-reports/kve151.
In selenoproteins, incorporation of the amino acid selenocysteine is specified by the UGA codon, usually a stop signal. The alternative decoding of UGA is conferred by an mRNA structure, the SECIS element, located in the 3'-untranslated region of the selenoprotein mRNA. Because of the non-standard use of the UGA codon, current computational gene prediction methods are unable to identify selenoproteins in the sequence of the eukaryotic genomes. Here we describe a method to predict selenoproteins in genomic sequences, which relies on the prediction of SECIS elements in coordination with the prediction of genes in which the strong codon bias characteristic of protein coding regions extends beyond a TGA codon interrupting the open reading frame. We applied the method to the Drosophila melanogaster genome, and predicted four potential selenoprotein genes. One of them belongs to a known family of selenoproteins, and we have tested experimentally two other predictions with positive results. Finally, we have characterized the expression pattern of these two novel selenoprotein genes.
在硒蛋白中,氨基酸硒代半胱氨酸的掺入由UGA密码子指定,而UGA通常是一个终止信号。UGA的这种另类解码是由一种位于硒蛋白mRNA 3'非翻译区的mRNA结构即硒代半胱氨酸插入序列(SECIS)元件赋予的。由于UGA密码子的非标准使用,当前的计算基因预测方法无法在真核生物基因组序列中识别出硒蛋白。在此,我们描述了一种在基因组序列中预测硒蛋白的方法,该方法依赖于对SECIS元件的预测,并与对那些蛋白质编码区强烈密码子偏好特征延伸超过中断开放阅读框的TGA密码子的基因预测相结合。我们将该方法应用于黑腹果蝇基因组,并预测了四个潜在的硒蛋白基因。其中一个属于已知的硒蛋白家族,并且我们已经对另外两个预测进行了实验测试,结果为阳性。最后,我们对这两个新的硒蛋白基因的表达模式进行了表征。