Ansari Elnaz Saberi, Eslahchi Changiz, Pezeshk Hamid, Sadeghi Mehdi
Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
Proteins. 2014 Sep;82(9):1937-46. doi: 10.1002/prot.24547. Epub 2014 Mar 24.
Decomposition of structural domains is an essential task in classifying protein structures, predicting protein function, and many other proteomics problems. As the number of known protein structures in PDB grows exponentially, the need for accurate automatic domain decomposition methods becomes more essential. In this article, we introduce a bottom-up algorithm for assigning protein domains using a graph theoretical approach. This algorithm is based on a center-based clustering approach. For constructing initial clusters, members of an independent dominating set for the graph representation of a protein are considered as the centers. A distance matrix is then defined for these clusters. To obtain final domains, these clusters are merged using the compactness principle of domains and a method similar to the neighbor-joining algorithm considering some thresholds. The thresholds are computed using a training set consisting of 50 protein chains. The algorithm is implemented using C++ language and is named ProDomAs. To assess the performance of ProDomAs, its results are compared with seven automatic methods, against five publicly available benchmarks. The results show that ProDomAs outperforms other methods applied on the mentioned benchmarks. The performance of ProDomAs is also evaluated against 6342 chains obtained from ASTRAL SCOP 1.71. ProDomAs is freely available at http://www.bioinf.cs.ipm.ir/software/prodomas.
结构域分解是蛋白质结构分类、蛋白质功能预测及许多其他蛋白质组学问题中的一项重要任务。随着蛋白质数据银行(PDB)中已知蛋白质结构数量呈指数增长,对准确的自动结构域分解方法的需求变得更加迫切。在本文中,我们介绍一种使用图论方法来分配蛋白质结构域的自底向上算法。该算法基于一种基于中心的聚类方法。为构建初始聚类,将蛋白质图形表示的独立支配集的成员视为中心。然后为这些聚类定义一个距离矩阵。为获得最终结构域,利用结构域的紧凑性原则以及一种类似于考虑某些阈值的邻接法的方法来合并这些聚类。这些阈值是使用由50条蛋白质链组成的训练集计算得出的。该算法用C++语言实现,名为ProDomAs。为评估ProDomAs的性能,将其结果与七种自动方法进行比较,并针对五个公开可用的基准进行测试。结果表明,ProDomAs在上述基准测试中优于其他方法。还针对从ASTRAL SCOP 1.71获得的6342条链评估了ProDomAs的性能。ProDomAs可在http://www.bioinf.cs.ipm.ir/software/prodomas上免费获取。