Koonin E V, Bork P, Sander C
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894.
EMBO J. 1994 Feb 1;13(3):493-503. doi: 10.1002/j.1460-2075.1994.tb06287.x.
One year after the release of the sequence of yeast chromosome III, we have re-examined its open reading frames (ORFs) by computer methods. More than 61% of the 171 probable gene products have significant sequence similarities in the current databases; as many as 54% have already known functions or are related to functionally characterized proteins, allowing partial prediction of protein function, 11 percentage points more than reported a year ago; 19% are similar to proteins of known three-dimensional structure, allowing model building by homology. The most interesting new identifications include a sugar kinase distantly related to ribokinases, a phosphatidyl serine synthetase, a putative transcription regulator, a flavodoxin-like protein, and a zinc finger protein belonging to a distinct subfamily. Several ORFs have similarities to uncharacterized proteins, resulting in new families in search of a function'. About 54% of ORFs match sequences from other phyla, including numerous fragments in the database of expressed sequence tags (ESTs). Most significant similarities to ESTs are with proteins in conserved families widely represented in the databases. About 30% of ORFs contain one or more predicted transmembrane segments. The increase in the power of functional and structural prediction comes from improvements in sequence analysis and from richer databases and is expected to facilitate substantially the experimental effort in characterizing the function of new gene products.
酵母三号染色体序列公布一年后,我们运用计算机方法对其开放阅读框(ORF)进行了重新审视。在171个可能的基因产物中,超过61%在当前数据库中具有显著的序列相似性;多达54%已具有已知功能或与功能已明确的蛋白质相关,从而能够对蛋白质功能进行部分预测,这一比例比一年前报告的高出11个百分点;19%与已知三维结构的蛋白质相似,可通过同源性进行模型构建。最有趣的新发现包括一种与核糖激酶有远亲关系的糖激酶、一种磷脂酰丝氨酸合成酶、一种假定的转录调节因子、一种类黄素氧还蛋白以及一种属于独特亚家族的锌指蛋白。有几个开放阅读框与未明确特征的蛋白质相似,从而形成了有待寻找功能的新家族。约54%的开放阅读框与其他门类的序列匹配,包括表达序列标签(EST)数据库中的众多片段。与EST最显著的相似之处在于与数据库中广泛存在的保守家族中的蛋白质相似。约30%的开放阅读框含有一个或多个预测的跨膜区段。功能和结构预测能力的提升源于序列分析的改进以及更丰富的数据库,预计这将极大地促进鉴定新基因产物功能的实验工作。