基于功能域组成的蛋白质四级结构分类

Classification of protein quaternary structure by functional domain composition.

作者信息

Yu Xiaojing, Wang Chuan, Li Yixue

机构信息

Bioinformatics Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China.

出版信息

BMC Bioinformatics. 2006 Apr 4;7:187. doi: 10.1186/1471-2105-7-187.

DOI:10.1186/1471-2105-7-187

PMID:16584572

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1450311/

Abstract

BACKGROUND

The number and the arrangement of subunits that form a protein are referred to as quaternary structure. Quaternary structure is an important protein attribute that is closely related to its function. Proteins with quaternary structure are called oligomeric proteins. Oligomeric proteins are involved in various biological processes, such as metabolism, signal transduction, and chromosome replication. Thus, it is highly desirable to develop some computational methods to automatically classify the quaternary structure of proteins from their sequences.

RESULTS

To explore this problem, we adopted an approach based on the functional domain composition of proteins. Every protein was represented by a vector calculated from the domains in the PFAM database. The nearest neighbor algorithm (NNA) was used for classifying the quaternary structure of proteins from this information. The jackknife cross-validation test was performed on the non-redundant protein dataset in which the sequence identity was less than 25%. The overall success rate obtained is 75.17%. Additionally, to demonstrate the effectiveness of this method, we predicted the proteins in an independent dataset and achieved an overall success rate of 84.11%

CONCLUSION

Compared with the amino acid composition method and Blast, the results indicate that the domain composition approach may be a more effective and promising high-throughput method in dealing with this complicated problem in bioinformatics.

摘要

背景

构成蛋白质的亚基数量和排列方式被称为四级结构。四级结构是一种重要的蛋白质属性，与蛋白质功能密切相关。具有四级结构的蛋白质被称为寡聚蛋白。寡聚蛋白参与各种生物过程，如新陈代谢、信号转导和染色体复制。因此，非常需要开发一些计算方法来根据蛋白质序列自动分类其四级结构。

结果

为了探索这个问题，我们采用了一种基于蛋白质功能域组成的方法。每个蛋白质都由从PFAM数据库中的结构域计算得到的向量表示。最近邻算法（NNA）用于根据这些信息对蛋白质的四级结构进行分类。在序列同一性小于25%的非冗余蛋白质数据集上进行了留一法交叉验证测试。获得的总体成功率为75.17%。此外，为了证明该方法的有效性，我们在一个独立的数据集中预测蛋白质，总体成功率达到了84.11%。

结论

与氨基酸组成方法和Blast相比，结果表明结构域组成方法可能是一种在处理生物信息学中这个复杂问题时更有效且有前景的高通量方法。

相似文献

Classification of protein quaternary structure by functional domain composition.基于功能域组成的蛋白质四级结构分类

BMC Bioinformatics. 2006 Apr 4;7:187. doi: 10.1186/1471-2105-7-187.

A Novel Prediction of Quaternary Structural Type of Proteins with Gene Ontology.基于基因本体论的蛋白质四级结构类型的新型预测

Protein Pept Lett. 2020;27(4):313-320. doi: 10.2174/0929866526666191014144618.

Classification of protein quaternary structure with support vector machine.用支持向量机对蛋白质四级结构进行分类。

Bioinformatics. 2003 Dec 12;19(18):2390-6. doi: 10.1093/bioinformatics/btg331.

Using Chou's pseudo amino acid composition to predict protein quaternary structure: a sequence-segmented PseAAC approach.利用周氏伪氨基酸组成预测蛋白质四级结构：一种序列分段伪氨基酸组成方法。

Amino Acids. 2008 Oct;35(3):591-8. doi: 10.1007/s00726-008-0086-x. Epub 2008 Apr 22.

Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition.结合功能域组成和伪氨基酸组成预测蛋白质亚细胞定位的最近邻算法

Biochem Biophys Res Commun. 2003 May 30;305(2):407-11. doi: 10.1016/s0006-291x(03)00775-7.

Predicting protein quaternary structure by pseudo amino acid composition.利用伪氨基酸组成预测蛋白质四级结构。

Proteins. 2003 Nov 1;53(2):282-9. doi: 10.1002/prot.10500.

QUATgo: Protein quaternary structural attributes predicted by two-stage machine learning approaches with heterogeneous feature encoding.QUATgo：通过具有异构特征编码的两阶段机器学习方法预测蛋白质四级结构属性。

PLoS One. 2020 Apr 29;15(4):e0232087. doi: 10.1371/journal.pone.0232087. eCollection 2020.

A new multi-label classifier in identifying the functional types of human membrane proteins.一种用于识别人类膜蛋白功能类型的新型多标签分类器。

J Membr Biol. 2015 Apr;248(2):179-86. doi: 10.1007/s00232-014-9755-8. Epub 2014 Nov 30.

Prediction of Saccharomyces cerevisiae protein functional class from functional domain composition.基于功能域组成预测酿酒酵母蛋白质功能类别

Bioinformatics. 2004 May 22;20(8):1292-300. doi: 10.1093/bioinformatics/bth085. Epub 2004 Feb 19.

Prediction of protein domain boundaries from sequence alone.仅从序列预测蛋白质结构域边界。

Protein Sci. 2003 Apr;12(4):696-701. doi: 10.1110/ps.0233103.

引用本文的文献

Cellular and epigenetic perspective of protein stability and its implications in the biological system.细胞和表观遗传学视角下的蛋白质稳定性及其在生物系统中的意义。

Epigenomics. 2024;16(11-12):879-900. doi: 10.1080/17501911.2024.2351788. Epub 2024 Jun 17.

QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition.QuaBingo：一种利用模块组成预测蛋白质四级结构属性的系统。

Biomed Res Int. 2016;2016:9480276. doi: 10.1155/2016/9480276. Epub 2016 Aug 17.

Predicting the network of substrate-enzyme-product triads by combining compound similarity and functional domain composition.通过组合化合物相似性和功能域组成来预测底物-酶-产物三联体网络。

BMC Bioinformatics. 2010 May 31;11:293. doi: 10.1186/1471-2105-11-293.

Protein sequences classification by means of feature extraction with substitution matrices.基于替换矩阵的特征提取对蛋白质序列进行分类。

BMC Bioinformatics. 2010 Apr 8;11:175. doi: 10.1186/1471-2105-11-175.

A knowledge-based method to predict the cooperative relationship between transcription factors.基于知识的方法预测转录因子之间的合作关系。

Mol Divers. 2010 Nov;14(4):815-9. doi: 10.1007/s11030-009-9177-1. Epub 2009 Jul 10.

The combination approach of SVM and ECOC for powerful identification and classification of transcription factor.支持向量机（SVM）和纠错输出编码（ECOC）相结合的方法用于转录因子的高效识别和分类。

BMC Bioinformatics. 2008 Jun 16;9:282. doi: 10.1186/1471-2105-9-282.

本文引用的文献

Predicting protein structural class by functional domain composition.通过功能域组成预测蛋白质结构类别。

Biochem Biophys Res Commun. 2004 Sep 3;321(4):1007-9. doi: 10.1016/j.bbrc.2004.07.059.

Accurate classification of homodimeric vs other homooligomeric proteins using a new measure of information discrepancy.

J Chem Inf Comput Sci. 2004 Jul-Aug;44(4):1324-7. doi: 10.1021/ci034288y.

Swiss-Prot: juggling between evolution and stability.瑞士蛋白质数据库：在进化与稳定性之间权衡

Brief Bioinform. 2004 Mar;5(1):39-55. doi: 10.1093/bib/5.1.39.

Prediction of Saccharomyces cerevisiae protein functional class from functional domain composition.基于功能域组成预测酿酒酵母蛋白质功能类别

Bioinformatics. 2004 May 22;20(8):1292-300. doi: 10.1093/bioinformatics/bth085. Epub 2004 Feb 19.

Predicting subcellular localization of proteins in a hybridization space.预测杂交空间中蛋白质的亚细胞定位。

Bioinformatics. 2004 May 1;20(7):1151-6. doi: 10.1093/bioinformatics/bth054. Epub 2004 Feb 5.

The Pfam protein families database.Pfam蛋白质家族数据库。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D138-41. doi: 10.1093/nar/gkh121.

Classification of protein quaternary structure with support vector machine.用支持向量机对蛋白质四级结构进行分类。

Bioinformatics. 2003 Dec 12;19(18):2390-6. doi: 10.1093/bioinformatics/btg331.

Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair.通过潜在相互作用结构域（PID）对进行蛋白质-蛋白质相互作用的大规模统计预测。

Genome Inform. 2002;13:42-50.

Predicting protein quaternary structure by pseudo amino acid composition.利用伪氨基酸组成预测蛋白质四级结构。

Proteins. 2003 Nov 1;53(2):282-9. doi: 10.1002/prot.10500.

The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain.还原多肽链氧化过程中天然核糖核酸酶的形成动力学。

Proc Natl Acad Sci U S A. 1961 Sep 15;47(9):1309-14. doi: 10.1073/pnas.47.9.1309.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验