Suppr超能文献

酵母三号染色体182个预测开放阅读框的全面序列分析。

Comprehensive sequence analysis of the 182 predicted open reading frames of yeast chromosome III.

作者信息

Bork P, Ouzounis C, Sander C, Scharf M, Schneider R, Sonnhammer E

机构信息

European Molecular Biology Laboratory, Heidelberg, Germany.

出版信息

Protein Sci. 1992 Dec;1(12):1677-90. doi: 10.1002/pro.5560011216.

Abstract

With the completion of the first phase of the European yeast genome sequencing project, the complete DNA sequence of chromosome III of Saccharomyces cerevisiae has become available (Oliver, S. G., et al., 1992, Nature 357, 38-46). We have tested the predictive power of computer sequence analysis of the 176 probable protein products of this chromosome, after exclusion of six problem cases. When the results of database similarity searches are pooled with prior knowledge, a likely function can be assigned to 42% of the proteins, and a predicted three-dimensional structure to a third of these (14% of the total). The function of the remaining 58% remains to be determined. Of these, about one-third have one or more probable transmembrane segments. Among the most interesting proteins with predicted functions are a new member of the type X polymerase family, a transcription factor with an N-terminal DNA-binding domain related to GAL4, a "fork head" DNA-binding domain previously known only in Drosophila and in mammals, and a putative methyltransferase. Our analysis increased the number of known significant sequence similarities on chromosome III by 13, to now 67. Although the near 40% success rate of identifying unknown protein function by sequence analysis is surprisingly high, the information gap between known protein sequences and unknown function is expected to widen and become a major bottleneck of genome projects in the near future. Based on the experience gained in this test study, we suggest that the development of an automated computer workbench for protein sequence analysis must be an important item in genome projects.

摘要

随着欧洲酵母基因组测序项目第一阶段的完成,酿酒酵母第三条染色体的完整DNA序列已可得(奥利弗,S.G.等人,1992年,《自然》357卷,38 - 46页)。在排除六个有问题的案例后,我们测试了对这条染色体上176个可能的蛋白质产物进行计算机序列分析的预测能力。当将数据库相似性搜索结果与先验知识汇总时,42%的蛋白质可被赋予可能的功能,其中三分之一(占总数的14%)可预测其三维结构。其余58%蛋白质的功能仍有待确定。其中约三分之一有一个或多个可能的跨膜区段。在具有预测功能的最有趣的蛋白质中,有X型聚合酶家族的一个新成员、一个N端DNA结合结构域与GAL4相关的转录因子、一个此前仅在果蝇和哺乳动物中已知的“叉头”DNA结合结构域以及一个推定的甲基转移酶。我们的分析使第三条染色体上已知的显著序列相似性数量增加了13个,达到现在的67个。尽管通过序列分析识别未知蛋白质功能近40%的成功率高得出人意料,但已知蛋白质序列与未知功能之间的信息差距预计在不久的将来会扩大,并成为基因组项目的一个主要瓶颈。基于在这项测试研究中获得的经验,我们建议开发一个用于蛋白质序列分析的自动化计算机工作台必须成为基因组项目中的一项重要内容。

相似文献

引用本文的文献

1
20 years of the SMART protein domain annotation resource.SMART 蛋白质结构域注释资源 20 年。
Nucleic Acids Res. 2018 Jan 4;46(D1):D493-D496. doi: 10.1093/nar/gkx922.
3
Solving the Problem: Genome Annotation Standards before the Data Deluge.解决问题:数据洪流之前的基因组注释标准
Stand Genomic Sci. 2011 Oct 15;5(1):168-93. doi: 10.4056/sigs.2084864. Epub 2011 Oct 1.
5
The past, present and future of genome-wide re-annotation.全基因组重新注释的过去、现在与未来。
Genome Biol. 2002;3(2):COMMENT2001. doi: 10.1186/gb-2002-3-2-comment2001. Epub 2002 Jan 31.

本文引用的文献

1
Identification of common molecular subsequences.常见分子子序列的鉴定
J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.
2
An improved algorithm for matching biological sequences.一种用于匹配生物序列的改进算法。
J Mol Biol. 1982 Dec 15;162(3):705-8. doi: 10.1016/0022-2836(82)90398-9.
4
Proteins.蛋白质
Sci Am. 1985 Oct;253(4):88-99. doi: 10.1038/scientificamerican1085-88.
5
Structural principles of parallel beta-barrels in proteins.
Proc Natl Acad Sci U S A. 1988 May;85(10):3338-42. doi: 10.1073/pnas.85.10.3338.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验