Suppr超能文献

使用简化氨基酸字母表的相对复杂度度量对蛋白质家族进行功能亚型聚类。

Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets.

机构信息

Biological Sciences and Bioengineering, Sabanci University, Orhanli, Tuzla, Istanbul, Turkey.

出版信息

BMC Bioinformatics. 2010 Aug 18;11:428. doi: 10.1186/1471-2105-11-428.

Abstract

BACKGROUND

Phylogenetic analysis can be used to divide a protein family into subfamilies in the absence of experimental information. Most phylogenetic analysis methods utilize multiple alignment of sequences and are based on an evolutionary model. However, multiple alignment is not an automated procedure and requires human intervention to maintain alignment integrity and to produce phylogenies consistent with the functional splits in underlying sequences. To address this problem, we propose to use the alignment-free Relative Complexity Measure (RCM) combined with reduced amino acid alphabets to cluster protein families into functional subtypes purely on sequence criteria. Comparison with an alignment-based approach was also carried out to test the quality of the clustering.

RESULTS

We demonstrate the robustness of RCM with reduced alphabets in clustering of protein sequences into families in a simulated dataset and seven well-characterized protein datasets. On protein datasets, crotonases, mandelate racemases, nucleotidyl cyclases and glycoside hydrolase family 2 were clustered into subfamilies with 100% accuracy whereas acyl transferase domains, haloacid dehalogenases, and vicinal oxygen chelates could be assigned to subfamilies with 97.2%, 96.9% and 92.2% accuracies, respectively.

CONCLUSIONS

The overall combination of methods in this paper is useful for clustering protein families into subtypes based on solely protein sequence information. The method is also flexible and computationally fast because it does not require multiple alignment of sequences.

摘要

背景

在缺乏实验信息的情况下,系统发育分析可用于将蛋白质家族划分为亚家族。大多数系统发育分析方法利用序列的多重比对,并基于进化模型。然而,多重比对不是一个自动化的过程,需要人为干预来维护比对的完整性,并生成与潜在序列的功能分裂一致的系统发育。为了解决这个问题,我们建议使用无比对的相对复杂度度量(RCM)与简化的氨基酸字母表相结合,仅根据序列标准将蛋白质家族聚类为功能亚型。我们还进行了基于比对的方法的比较,以测试聚类的质量。

结果

我们在模拟数据集和七个特征明确的蛋白质数据集中展示了使用简化字母表的 RCM 在将蛋白质序列聚类成家族方面的稳健性。在蛋白质数据集上,巴豆酰辅酶 A 水解酶、扁桃酸 racemase、核苷酸环化酶和糖苷水解酶家族 2 以 100%的准确度聚类成亚家族,而酰基转移酶结构域、卤代酸脱卤酶和邻位氧螯合物可以以 97.2%、96.9%和 92.2%的准确度分别分配到亚家族。

结论

本文中方法的总体组合可用于仅根据蛋白质序列信息将蛋白质家族聚类成亚型。该方法还具有灵活性和快速的计算速度,因为它不需要序列的多重比对。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa2d/2936399/cf4e1cc7c700/1471-2105-11-428-2.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验