Kumar Manish, Thakur Varun, Raghava Gajendra P S
Institute ofMicrobial Technology, Sector 39-A, Chandigarh, India.
In Silico Biol. 2008;8(2):121-8.
In the past, a large number of methods have been developed for predicting various characteristics of a protein from its composition. In order to exploit the full potential of protein composition, we developed the web-server COPid to assist the researchers in annotating the function of a protein from its composition using whole or part of the protein. COPid has three modules called search, composition and analysis. The search module allows searching of protein sequences in six different databases. Search results list database proteins in ascending order of Euclidian distance or descending order of compositional similarity with the query sequence. The composition module allows calculation of the composition of a sequence and average composition of a group of sequences. The composition module also allows computing composition of various types of amino acids (e.g. charge, polar, hydrophobic residues). The analysis module provides the following options; i) comparing composition of two classes of proteins, ii) creating a phylogenetic tree based on the composition and iii) generating input patterns for machine learning techniques. We have evaluated the performance of composition-based (or alignment-free) similarity search in the subcellular localization of proteins. It was found that the alignment free method performs reasonably well in predicting certain classes of proteins. The COPid web-server is available at http://www.imtech.res.in/raghava/copid/.
过去,已经开发了大量从蛋白质组成预测其各种特征的方法。为了充分利用蛋白质组成的全部潜力,我们开发了网络服务器COPid,以协助研究人员使用整个或部分蛋白质从其组成注释蛋白质的功能。COPid有三个模块,分别称为搜索、组成和分析。搜索模块允许在六个不同的数据库中搜索蛋白质序列。搜索结果按欧几里得距离升序或与查询序列的组成相似性降序列出数据库蛋白质。组成模块允许计算序列的组成和一组序列的平均组成。组成模块还允许计算各种类型氨基酸的组成(例如带电荷、极性、疏水残基)。分析模块提供以下选项:i)比较两类蛋白质的组成,ii)基于组成创建系统发育树,以及iii)为机器学习技术生成输入模式。我们已经评估了基于组成(或无比对)的相似性搜索在蛋白质亚细胞定位中的性能。发现无比对方法在预测某些类别的蛋白质方面表现相当不错。COPid网络服务器可在http://www.imtech.res.in/raghava/copid/获取。