• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用简化氨基酸字母表的相对复杂度度量对蛋白质家族进行功能亚型聚类。

Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets.

机构信息

Biological Sciences and Bioengineering, Sabanci University, Orhanli, Tuzla, Istanbul, Turkey.

出版信息

BMC Bioinformatics. 2010 Aug 18;11:428. doi: 10.1186/1471-2105-11-428.

DOI:10.1186/1471-2105-11-428
PMID:20718947
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2936399/
Abstract

BACKGROUND

Phylogenetic analysis can be used to divide a protein family into subfamilies in the absence of experimental information. Most phylogenetic analysis methods utilize multiple alignment of sequences and are based on an evolutionary model. However, multiple alignment is not an automated procedure and requires human intervention to maintain alignment integrity and to produce phylogenies consistent with the functional splits in underlying sequences. To address this problem, we propose to use the alignment-free Relative Complexity Measure (RCM) combined with reduced amino acid alphabets to cluster protein families into functional subtypes purely on sequence criteria. Comparison with an alignment-based approach was also carried out to test the quality of the clustering.

RESULTS

We demonstrate the robustness of RCM with reduced alphabets in clustering of protein sequences into families in a simulated dataset and seven well-characterized protein datasets. On protein datasets, crotonases, mandelate racemases, nucleotidyl cyclases and glycoside hydrolase family 2 were clustered into subfamilies with 100% accuracy whereas acyl transferase domains, haloacid dehalogenases, and vicinal oxygen chelates could be assigned to subfamilies with 97.2%, 96.9% and 92.2% accuracies, respectively.

CONCLUSIONS

The overall combination of methods in this paper is useful for clustering protein families into subtypes based on solely protein sequence information. The method is also flexible and computationally fast because it does not require multiple alignment of sequences.

摘要

背景

在缺乏实验信息的情况下,系统发育分析可用于将蛋白质家族划分为亚家族。大多数系统发育分析方法利用序列的多重比对,并基于进化模型。然而,多重比对不是一个自动化的过程,需要人为干预来维护比对的完整性,并生成与潜在序列的功能分裂一致的系统发育。为了解决这个问题,我们建议使用无比对的相对复杂度度量(RCM)与简化的氨基酸字母表相结合,仅根据序列标准将蛋白质家族聚类为功能亚型。我们还进行了基于比对的方法的比较,以测试聚类的质量。

结果

我们在模拟数据集和七个特征明确的蛋白质数据集中展示了使用简化字母表的 RCM 在将蛋白质序列聚类成家族方面的稳健性。在蛋白质数据集上,巴豆酰辅酶 A 水解酶、扁桃酸 racemase、核苷酸环化酶和糖苷水解酶家族 2 以 100%的准确度聚类成亚家族,而酰基转移酶结构域、卤代酸脱卤酶和邻位氧螯合物可以以 97.2%、96.9%和 92.2%的准确度分别分配到亚家族。

结论

本文中方法的总体组合可用于仅根据蛋白质序列信息将蛋白质家族聚类成亚型。该方法还具有灵活性和快速的计算速度,因为它不需要序列的多重比对。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa2d/2936399/cf4e1cc7c700/1471-2105-11-428-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa2d/2936399/cf4e1cc7c700/1471-2105-11-428-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa2d/2936399/cf4e1cc7c700/1471-2105-11-428-2.jpg

相似文献

1
Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets.使用简化氨基酸字母表的相对复杂度度量对蛋白质家族进行功能亚型聚类。
BMC Bioinformatics. 2010 Aug 18;11:428. doi: 10.1186/1471-2105-11-428.
2
CLUSS: clustering of protein sequences based on a new similarity measure.CLUSS:基于一种新的相似性度量对蛋白质序列进行聚类。
BMC Bioinformatics. 2007 Aug 4;8:286. doi: 10.1186/1471-2105-8-286.
3
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
4
Clustering of protein domains for functional and evolutionary studies.蛋白质结构域聚类在功能和进化研究中的应用。
BMC Bioinformatics. 2009 Oct 15;10:335. doi: 10.1186/1471-2105-10-335.
5
Automated alphabet reduction for protein datasets.蛋白质数据集的自动字母缩减
BMC Bioinformatics. 2009 Jan 6;10:6. doi: 10.1186/1471-2105-10-6.
6
CLUSS2: an alignment-independent algorithm for clustering protein families with multiple biological functions.CLUSS2:一种用于对具有多种生物学功能的蛋白质家族进行聚类的非比对算法。
Int J Comput Biol Drug Des. 2008;1(2):122-40. doi: 10.1504/ijcbdd.2008.020190.
7
High-quality sequence clustering guided by network topology and multiple alignment likelihood.网络拓扑和多重比对可能性引导的高质量序列聚类。
Bioinformatics. 2012 Apr 15;28(8):1078-85. doi: 10.1093/bioinformatics/bts098. Epub 2012 Feb 25.
8
Mapping sequence to feature vector using numerical representation of codons targeted to amino acids for alignment-free sequence analysis.使用针对氨基酸的密码子的数值表示将序列映射到特征向量,用于无比对序列分析。
Gene. 2021 Jan 15;766:145096. doi: 10.1016/j.gene.2020.145096. Epub 2020 Sep 9.
9
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space.用于对海量数据集进行精确层次聚类的高效算法:攻克整个蛋白质空间
Bioinformatics. 2008 Jul 1;24(13):i41-9. doi: 10.1093/bioinformatics/btn174.
10
Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.离散与连续蛋白质结构空间之间的交叉:对蛋白质结构自动分类及网络的见解。
PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27.

引用本文的文献

1
Alignment-free sequence comparison: benefits, applications, and tools.无比对信息的序列比对:优势、应用和工具。
Genome Biol. 2017 Oct 3;18(1):186. doi: 10.1186/s13059-017-1319-7.
2
Unearthing the root of amino acid similarity.挖掘氨基酸相似性的根源。
J Mol Evol. 2013 Oct;77(4):159-69. doi: 10.1007/s00239-013-9565-0. Epub 2013 Jun 7.
3
Top-down clustering for protein subfamily identification.基于自顶向下的聚类方法进行蛋白质亚家族识别。

本文引用的文献

1
Partially-supervised protein subclass discovery with simultaneous annotation of functional residues.具有功能残基同步注释的部分监督蛋白质亚类发现
BMC Struct Biol. 2009 Oct 26;9:68. doi: 10.1186/1472-6807-9-68.
2
Clustering of protein domains for functional and evolutionary studies.蛋白质结构域聚类在功能和进化研究中的应用。
BMC Bioinformatics. 2009 Oct 15;10:335. doi: 10.1186/1471-2105-10-335.
3
INDELible: a flexible simulator of biological sequence evolution.INDELible:一款灵活的生物序列进化模拟器。
Evol Bioinform Online. 2013 May 6;9:185-202. doi: 10.4137/EBO.S11609. Print 2013.
4
Testing robustness of relative complexity measure method constructing robust phylogenetic trees for Galanthus L. using the relative complexity measure.使用相对复杂度度量方法构建喜马拉雅雪花莲属(Galanthus L.)稳健系统发育树的相对复杂度度量方法的稳健性测试。
BMC Bioinformatics. 2013 Jan 17;14:20. doi: 10.1186/1471-2105-14-20.
5
ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval.ProDis-ContSHC:在蛋白质数据库检索中用于蛋白质-蛋白质比较的学习蛋白质非相似性度量和层次上下文一致性。
BMC Bioinformatics. 2012 May 8;13 Suppl 7(Suppl 7):S2. doi: 10.1186/1471-2105-13-S7-S2.
6
Novel hydrophobins from Trichoderma define a new hydrophobin subclass: protein properties, evolution, regulation and processing.新型棘孢木霉来源的水蛋白定义了一个新的水蛋白亚类:蛋白特性、进化、调控和加工。
J Mol Evol. 2011 Apr;72(4):339-51. doi: 10.1007/s00239-011-9438-3. Epub 2011 Mar 22.
Mol Biol Evol. 2009 Aug;26(8):1879-88. doi: 10.1093/molbev/msp098. Epub 2009 May 7.
4
Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.简化氨基酸字母表在折叠分配中表现出更高的灵敏度和选择性。
Bioinformatics. 2009 Jun 1;25(11):1356-62. doi: 10.1093/bioinformatics/btp164. Epub 2009 Apr 7.
5
Grammar-based distance in progressive multiple sequence alignment.渐进多序列比对中基于语法的距离
BMC Bioinformatics. 2008 Jul 10;9:306. doi: 10.1186/1471-2105-9-306.
6
Amino acid alphabet size in protein evolution experiments: better to search a small library thoroughly or a large library sparsely?蛋白质进化实验中的氨基酸字母表大小:彻底搜索一个小文库还是稀疏搜索一个大文库更好?
Protein Eng Des Sel. 2008 May;21(5):311-7. doi: 10.1093/protein/gzn007. Epub 2008 Mar 28.
7
Clustal W and Clustal X version 2.0.Clustal W和Clustal X 2.0版本
Bioinformatics. 2007 Nov 1;23(21):2947-8. doi: 10.1093/bioinformatics/btm404. Epub 2007 Sep 10.
8
Automated protein subfamily identification and classification.蛋白质亚家族的自动识别与分类
PLoS Comput Biol. 2007 Aug;3(8):e160. doi: 10.1371/journal.pcbi.0030160.
9
CLUSS: clustering of protein sequences based on a new similarity measure.CLUSS:基于一种新的相似性度量对蛋白质序列进行聚类。
BMC Bioinformatics. 2007 Aug 4;8:286. doi: 10.1186/1471-2105-8-286.
10
A reduced amino acid alphabet for understanding and designing protein adaptation to mutation.用于理解和设计蛋白质对突变适应性的简化氨基酸字母表。
Eur Biophys J. 2007 Nov;36(8):1059-69. doi: 10.1007/s00249-007-0188-5. Epub 2007 Jun 13.