Suppr超能文献

基于二级结构的蛋白质结构类别的划分

Secondary structure-based assignment of the protein structural classes.

作者信息

Kurgan Lukasz A, Zhang Tuo, Zhang Hua, Shen Shiyi, Ruan Jishou

机构信息

Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada.

出版信息

Amino Acids. 2008 Oct;35(3):551-64. doi: 10.1007/s00726-008-0080-3. Epub 2008 Apr 22.

Abstract

Structural class categorizes proteins based on the amount and arrangement of the constituent secondary structures. The knowledge of structural classes is applied in numerous important predictive tasks that address structural and functional features of proteins. We propose novel structural class assignment methods that use one-dimensional (1D) secondary structure as the input. The methods are designed based on a large set of low-identity sequences for which secondary structure is predicted from their sequence (PSSA(sc) model) or assigned based on their tertiary structure (SSA(sc)). The secondary structure is encoded using a comprehensive set of features describing count, content, and size of secondary structure segments, which are fed into a small decision tree that uses ten features to perform the assignment. The proposed models were compared against seven secondary structure-based and ten sequence-based structural class predictors. Using the 1D secondary structure, SSA(sc) and PSSA(sc) can assign proteins to the four main structural classes, while the existing secondary structure-based assignment methods can predict only three classes. Empirical evaluation shows that the proposed models are quite promising. Using the structure-based assignment performed in SCOP (structural classification of proteins) as the golden standard, the accuracy of SSA(sc) and PSSA(sc) equals 76 and 75%, respectively. We show that the use of the secondary structure predicted from the sequence as an input does not have a detrimental effect on the quality of structural class assignment when compared with using secondary structure derived from tertiary structure. Therefore, PSSA(sc) can be used to perform the automated assignment of structural classes based on the sequences.

摘要

结构类别根据组成二级结构的数量和排列对蛋白质进行分类。结构类别的知识被应用于许多重要的预测任务中,这些任务涉及蛋白质的结构和功能特征。我们提出了新颖的结构类别分配方法,该方法使用一维(1D)二级结构作为输入。这些方法是基于大量低同源性序列设计的,对于这些序列,二级结构是根据其序列预测的(PSSA(sc)模型)或根据其三级结构分配的(SSA(sc))。二级结构使用一组全面的特征进行编码,这些特征描述了二级结构片段的数量、含量和大小,然后将这些特征输入到一个小型决策树中,该决策树使用十个特征来进行分配。将所提出的模型与七个基于二级结构的和十个基于序列的结构类别预测器进行了比较。使用1D二级结构,SSA(sc)和PSSA(sc)可以将蛋白质分配到四个主要结构类别,而现有的基于二级结构的分配方法只能预测三个类别。实证评估表明,所提出的模型很有前景。以SCOP(蛋白质结构分类)中基于结构的分配作为黄金标准,SSA(sc)和PSSA(sc)的准确率分别为76%和75%。我们表明,与使用从三级结构衍生的二级结构相比,使用从序列预测的二级结构作为输入对结构类别分配的质量没有不利影响。因此,PSSA(sc)可用于基于序列进行结构类别的自动分配。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验