基于自顶向下的聚类方法进行蛋白质亚家族识别。

Top-down clustering for protein subfamily identification.

机构信息

Department of Computer Science, KU Leuven, Belgium.

出版信息

Evol Bioinform Online. 2013 May 6;9:185-202. doi: 10.4137/EBO.S11609. Print 2013.

DOI:10.4137/EBO.S11609

PMID:23700359

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3653887/

Abstract

We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up and associates particular mutations with each division into subclusters. The motivating hypothesis for this method is that it may yield a better tree topology with more accurate subfamily identification as a result and additionally indicates functionally important sites and allows for easy classification of new proteins. A thorough experimental evaluation confirms the hypothesis. The novel method yields more accurate clusters and a better tree topology than the state-of-the-art method SCI-PHY, identifies known functional sites, and identifies mutations that alone allow for classifying new sequences with an accuracy approaching that of hidden Markov models.

摘要

我们提出了一种新的方法来进行蛋白质亚家族识别任务，即找到蛋白质家族中功能密切相关的序列亚群。与系统发生基因组学分析一致，该方法首先使用蛋白质序列的多重比对作为输入构建层次树，然后使用后剪枝过程从树中提取聚类。与现有方法不同，它自顶向下构建层次树，而不是自底向上，并将特定的突变与每个子聚类的划分相关联。这种方法的动机假设是，它可能会产生更好的树拓扑结构，从而更准确地识别亚家族，并且还可以指示功能重要的位点，并允许对新蛋白质进行轻松分类。彻底的实验评估证实了这一假设。与最先进的方法 SCI-PHY 相比，新方法产生了更准确的聚类和更好的树拓扑结构，能够识别已知的功能位点，并能够识别单独允许对新序列进行分类的突变，其准确性接近隐马尔可夫模型。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于自顶向下的聚类方法进行蛋白质亚家族识别。

Top-down clustering for protein subfamily identification.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

基于自顶向下的聚类方法进行蛋白质亚家族识别。

Top-down clustering for protein subfamily identification.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献