Murphy Eain, Rigoutsos Isidore, Shibuya Tetsuo, Shenk Thomas E
Department of Molecular Biology, Princeton University, Princeton, NJ 80544, USA.
Proc Natl Acad Sci U S A. 2003 Nov 11;100(23):13585-90. doi: 10.1073/pnas.1735466100. Epub 2003 Oct 30.
The Bio-Dictionary-based Gene Finder was used to reassess the coding potential of the AD169 laboratory strain of human cytomegalovirus and sequences in the Toledo strain that are missing in the laboratory strain of the virus. The gene-finder algorithm assesses the potential of an ORF to encode a protein based on matches to a database of amino acid patterns derived from a large collection of proteins. The algorithm was used to score all human cytomegalovirus ORFs with the potential to encode polypeptides >/=50 aa in length. As a further test for functionality, the genomes of the chimpanzee, rhesus, and murine cytomegaloviruses were searched for orthologues of the predicted human cytomegalovirus ORFs. The analysis indicates that 37 previously annotated ORFs ought to be discarded, and at least nine previously unrecognized ORFs with relatively strong coding potential should be added. Thus, the human cytomegalovirus genome appears to contain approximately 192 unique ORFs with the potential to encode a protein. Support for several of the predictions of our in silico analysis was obtained by sequencing several domains within a clinical isolate of human cytomegalovirus.
基于生物词典的基因查找工具被用于重新评估人巨细胞病毒AD169实验室菌株以及该病毒实验室菌株中缺失的托莱多菌株序列的编码潜力。该基因查找算法基于与从大量蛋白质集合中衍生的氨基酸模式数据库的匹配,来评估一个开放阅读框(ORF)编码蛋白质的潜力。该算法用于对所有有可能编码长度大于或等于50个氨基酸的多肽的人巨细胞病毒ORF进行评分。作为功能的进一步测试,在黑猩猩、恒河猴和鼠类巨细胞病毒的基因组中搜索预测的人巨细胞病毒ORF的直系同源物。分析表明,37个先前注释的ORF应该被舍弃,并且应该添加至少9个先前未被识别的、具有相对较强编码潜力的ORF。因此,人巨细胞病毒基因组似乎包含大约192个有可能编码蛋白质的独特ORF。通过对人巨细胞病毒临床分离株内的几个结构域进行测序,获得了对我们的计算机模拟分析的几个预测的支持。