• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于自然向量和凸包方法的蛋白质序列分类

Protein Sequence Classification Using Natural Vector and Convex Hull Method.

作者信息

Wang Yi, Tian Kun, Yau Stephen S-T

机构信息

Department of Mathematical Sciences, Tsinghua University, Beijing, P.R. China.

出版信息

J Comput Biol. 2019 Apr;26(4):315-321. doi: 10.1089/cmb.2018.0216. Epub 2019 Feb 14.

DOI:10.1089/cmb.2018.0216
PMID:30762422
Abstract

Protein kinase C (PKC) is a superfamily of enzymes, which regulate numerous cellular responses. The specific function of PKC protein family is mainly governed by its individual protein domains. However, existing protein sequence classification methods based on sequence alignment and sequence analysis models focused little on the domain analysis. In this study, we introduce a novel protein kinase classification method that considers both domain sequence similarity and whole sequence similarity to quantify the evolutionary distance from a specific protein to a protein family. Using the natural vector method, we establish a 60-dimensional space, where each protein is uniquely represented by a vector. We also define a convex hull, consisting of the natural vectors corresponding to all members of a protein family. The sequence similarity between a protein and a protein family, therefore, can be quantified as the distance between the protein vector and the protein family convex hull. We have applied this method in a PKC sample library and the results showed a higher accuracy of classification compared with other alignment-free methods.

摘要

蛋白激酶C(PKC)是一类酶的超家族,可调节多种细胞反应。PKC蛋白家族的特定功能主要由其单个蛋白结构域决定。然而,现有的基于序列比对和序列分析模型的蛋白质序列分类方法很少关注结构域分析。在本研究中,我们引入了一种新颖的蛋白激酶分类方法,该方法同时考虑结构域序列相似性和全序列相似性,以量化从特定蛋白质到蛋白质家族的进化距离。使用自然向量法,我们建立了一个60维空间,其中每个蛋白质由一个向量唯一表示。我们还定义了一个凸包,它由与蛋白质家族所有成员对应的自然向量组成。因此,蛋白质与蛋白质家族之间的序列相似性可以量化为蛋白质向量与蛋白质家族凸包之间的距离。我们已将此方法应用于PKC样本库,结果表明与其他无比对方法相比,分类准确性更高。

相似文献

1
Protein Sequence Classification Using Natural Vector and Convex Hull Method.基于自然向量和凸包方法的蛋白质序列分类
J Comput Biol. 2019 Apr;26(4):315-321. doi: 10.1089/cmb.2018.0216. Epub 2019 Feb 14.
2
Convex hull principle for classification and phylogeny of eukaryotic proteins.凸包原理在真核生物蛋白质分类和系统发育中的应用。
Genomics. 2019 Dec;111(6):1777-1784. doi: 10.1016/j.ygeno.2018.11.033. Epub 2018 Dec 5.
3
Convex hull analysis of evolutionary and phylogenetic relationships between biological groups.凸壳分析生物群体间的进化和系统发育关系。
J Theor Biol. 2018 Nov 7;456:34-40. doi: 10.1016/j.jtbi.2018.07.035. Epub 2018 Jul 27.
4
Classification of Protein Sequences by a Novel Alignment-Free Method on Bacterial and Virus Families.基于新型无比对方法对细菌和病毒家族的蛋白质序列分类。
Genes (Basel). 2022 Sep 27;13(10):1744. doi: 10.3390/genes13101744.
5
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
6
Structural evolution of the protein kinase-like superfamily.蛋白激酶样超家族的结构演变
PLoS Comput Biol. 2005 Oct;1(5):e49. doi: 10.1371/journal.pcbi.0010049. Epub 2005 Oct 21.
7
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
8
Protein sequence comparison based on representation on a finite dimensional unit hypercube.基于有限维单位超正方体表示的蛋白质序列比较。
J Biomol Struct Dyn. 2024 Aug;42(12):6425-6439. doi: 10.1080/07391102.2023.2268719. Epub 2023 Oct 14.
9
Atlas of the Radical SAM Superfamily: Divergent Evolution of Function Using a "Plug and Play" Domain.自由基SAM超家族图谱:利用“即插即用”结构域实现功能的趋异进化
Methods Enzymol. 2018;606:1-71. doi: 10.1016/bs.mie.2018.06.004. Epub 2018 Jul 24.
10
SplitTester: software to identify domains responsible for functional divergence in protein family.SplitTester:用于识别蛋白质家族中功能差异相关结构域的软件。
BMC Bioinformatics. 2005 Jun 1;6:137. doi: 10.1186/1471-2105-6-137.

引用本文的文献

1
The grand biological universe: A comprehensive geometric construction of genome space.宏大的生物宇宙:基因组空间的全面几何构建
Innovation (Camb). 2025 Apr 30;6(8):100937. doi: 10.1016/j.xinn.2025.100937. eCollection 2025 Aug 4.
2
Unsupervised identification of significant lineages of SARS-CoV-2 through scalable machine learning methods.通过可扩展的机器学习方法对 SARS-CoV-2 的重要谱系进行无监督识别。
Proc Natl Acad Sci U S A. 2024 Mar 19;121(12):e2317284121. doi: 10.1073/pnas.2317284121. Epub 2024 Mar 13.
3
Classification of Protein Sequences by a Novel Alignment-Free Method on Bacterial and Virus Families.
基于新型无比对方法对细菌和病毒家族的蛋白质序列分类。
Genes (Basel). 2022 Sep 27;13(10):1744. doi: 10.3390/genes13101744.
4
The Resolved Mutual Information Function as a Structural Fingerprint of Biomolecular Sequences for Interpretable Machine Learning Classifiers.作为可解释机器学习分类器的生物分子序列结构指纹的解析互信息函数
Entropy (Basel). 2021 Oct 17;23(10):1357. doi: 10.3390/e23101357.
5
Simple Classification of RNA Sequences of Respiratory-Related Coronaviruses.呼吸道相关冠状病毒RNA序列的简单分类
ACS Omega. 2021 Jul 28;6(31):20158-20165. doi: 10.1021/acsomega.1c01625. eCollection 2021 Aug 10.
6
A novel numerical representation for proteins: Three-dimensional Chaos Game Representation and its Extended Natural Vector.一种蛋白质的新型数值表示:三维混沌博弈表示及其扩展自然向量。
Comput Struct Biotechnol J. 2020 Jul 15;18:1904-1913. doi: 10.1016/j.csbj.2020.07.004. eCollection 2020.