Suppr超能文献

基于自然向量和凸包方法的蛋白质序列分类

Protein Sequence Classification Using Natural Vector and Convex Hull Method.

作者信息

Wang Yi, Tian Kun, Yau Stephen S-T

机构信息

Department of Mathematical Sciences, Tsinghua University, Beijing, P.R. China.

出版信息

J Comput Biol. 2019 Apr;26(4):315-321. doi: 10.1089/cmb.2018.0216. Epub 2019 Feb 14.

Abstract

Protein kinase C (PKC) is a superfamily of enzymes, which regulate numerous cellular responses. The specific function of PKC protein family is mainly governed by its individual protein domains. However, existing protein sequence classification methods based on sequence alignment and sequence analysis models focused little on the domain analysis. In this study, we introduce a novel protein kinase classification method that considers both domain sequence similarity and whole sequence similarity to quantify the evolutionary distance from a specific protein to a protein family. Using the natural vector method, we establish a 60-dimensional space, where each protein is uniquely represented by a vector. We also define a convex hull, consisting of the natural vectors corresponding to all members of a protein family. The sequence similarity between a protein and a protein family, therefore, can be quantified as the distance between the protein vector and the protein family convex hull. We have applied this method in a PKC sample library and the results showed a higher accuracy of classification compared with other alignment-free methods.

摘要

蛋白激酶C(PKC)是一类酶的超家族,可调节多种细胞反应。PKC蛋白家族的特定功能主要由其单个蛋白结构域决定。然而,现有的基于序列比对和序列分析模型的蛋白质序列分类方法很少关注结构域分析。在本研究中,我们引入了一种新颖的蛋白激酶分类方法,该方法同时考虑结构域序列相似性和全序列相似性,以量化从特定蛋白质到蛋白质家族的进化距离。使用自然向量法,我们建立了一个60维空间,其中每个蛋白质由一个向量唯一表示。我们还定义了一个凸包,它由与蛋白质家族所有成员对应的自然向量组成。因此,蛋白质与蛋白质家族之间的序列相似性可以量化为蛋白质向量与蛋白质家族凸包之间的距离。我们已将此方法应用于PKC样本库,结果表明与其他无比对方法相比,分类准确性更高。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验