Suppr超能文献

基于功能域组成的蛋白质四级结构分类

Classification of protein quaternary structure by functional domain composition.

作者信息

Yu Xiaojing, Wang Chuan, Li Yixue

机构信息

Bioinformatics Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China.

出版信息

BMC Bioinformatics. 2006 Apr 4;7:187. doi: 10.1186/1471-2105-7-187.

Abstract

BACKGROUND

The number and the arrangement of subunits that form a protein are referred to as quaternary structure. Quaternary structure is an important protein attribute that is closely related to its function. Proteins with quaternary structure are called oligomeric proteins. Oligomeric proteins are involved in various biological processes, such as metabolism, signal transduction, and chromosome replication. Thus, it is highly desirable to develop some computational methods to automatically classify the quaternary structure of proteins from their sequences.

RESULTS

To explore this problem, we adopted an approach based on the functional domain composition of proteins. Every protein was represented by a vector calculated from the domains in the PFAM database. The nearest neighbor algorithm (NNA) was used for classifying the quaternary structure of proteins from this information. The jackknife cross-validation test was performed on the non-redundant protein dataset in which the sequence identity was less than 25%. The overall success rate obtained is 75.17%. Additionally, to demonstrate the effectiveness of this method, we predicted the proteins in an independent dataset and achieved an overall success rate of 84.11%

CONCLUSION

Compared with the amino acid composition method and Blast, the results indicate that the domain composition approach may be a more effective and promising high-throughput method in dealing with this complicated problem in bioinformatics.

摘要

背景

构成蛋白质的亚基数量和排列方式被称为四级结构。四级结构是一种重要的蛋白质属性,与蛋白质功能密切相关。具有四级结构的蛋白质被称为寡聚蛋白。寡聚蛋白参与各种生物过程,如新陈代谢、信号转导和染色体复制。因此,非常需要开发一些计算方法来根据蛋白质序列自动分类其四级结构。

结果

为了探索这个问题,我们采用了一种基于蛋白质功能域组成的方法。每个蛋白质都由从PFAM数据库中的结构域计算得到的向量表示。最近邻算法(NNA)用于根据这些信息对蛋白质的四级结构进行分类。在序列同一性小于25%的非冗余蛋白质数据集上进行了留一法交叉验证测试。获得的总体成功率为75.17%。此外,为了证明该方法的有效性,我们在一个独立的数据集中预测蛋白质,总体成功率达到了84.11%。

结论

与氨基酸组成方法和Blast相比,结果表明结构域组成方法可能是一种在处理生物信息学中这个复杂问题时更有效且有前景的高通量方法。

相似文献

引用本文的文献

本文引用的文献

1
Predicting protein structural class by functional domain composition.通过功能域组成预测蛋白质结构类别。
Biochem Biophys Res Commun. 2004 Sep 3;321(4):1007-9. doi: 10.1016/j.bbrc.2004.07.059.
5
Predicting subcellular localization of proteins in a hybridization space.预测杂交空间中蛋白质的亚细胞定位。
Bioinformatics. 2004 May 1;20(7):1151-6. doi: 10.1093/bioinformatics/bth054. Epub 2004 Feb 5.
6
The Pfam protein families database.Pfam蛋白质家族数据库。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D138-41. doi: 10.1093/nar/gkh121.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验