Aurora R, Rose G D
Department of Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, USA.
Proc Natl Acad Sci U S A. 1998 Mar 17;95(6):2818-23. doi: 10.1073/pnas.95.6.2818.
We have developed a simple procedure to identify protein homologs in genomic databases. The program, called ORF, is based on comparisons of predicted secondary structure. Protein structure is far better conserved than amino acid sequence, and structure-based methods have been effective in exploiting this fact to find homologs, even among proteins with scant sequence identity. ORF is a secondary structure-based method that operates solely on predictions from sequence and requires no experimentally determined information about the structure. The approach is illustrated by an example: Thymidylate synthase, a highly conserved enzyme essential to thymidine biosynthesis in both prokaryotes and eukaryotes, is thought to be used by Archaea, but a corresponding gene has yet to be identified. Here, a candidate thymidylate synthase is identified as a previously unassigned open reading frame from the genome of Methanococcus jannaschii, viz., MJ0757. Using primary structure information alone, the optimally aligned sequence identity between MJ0757 and Escherichia coli thymidylate synthase is 7%, well below the threshold of sensitivity for detection by sequence-based methods.
我们开发了一种简单的程序来在基因组数据库中识别蛋白质同源物。该程序名为ORF,基于预测的二级结构进行比较。蛋白质结构比氨基酸序列保守得多,基于结构的方法已有效地利用这一事实来寻找同源物,即使是在序列同一性很低的蛋白质之间。ORF是一种基于二级结构的方法,仅根据序列预测进行操作,不需要关于结构的实验确定信息。通过一个例子来说明这种方法:胸苷酸合成酶是原核生物和真核生物中胸苷生物合成所必需的一种高度保守的酶,古细菌也被认为使用这种酶,但尚未鉴定出相应的基因。在此,从詹氏甲烷球菌的基因组中,一个候选胸苷酸合成酶被鉴定为一个先前未分配的开放阅读框,即MJ0757。仅使用一级结构信息,MJ0757与大肠杆菌胸苷酸合成酶之间的最佳比对序列同一性为7%,远低于基于序列的方法进行检测的灵敏度阈值。