Barthel Daniel, Hirst Jonathan D, Błazewicz Jacek, Burke Edmund K, Krasnogor Natalio
ASAP, School of Computer Science and IT, University of Nottingham, Nottingham, NG8 1BB, UK.
BMC Bioinformatics. 2007 Oct 26;8:416. doi: 10.1186/1471-2105-8-416.
We introduce the decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information (ProCKSI). ProCKSI integrates various protein similarity measures through an easy to use interface that allows the comparison of multiple proteins simultaneously. It employs the Universal Similarity Metric (USM), the Maximum Contact Map Overlap (MaxCMO) of protein structures and other external methods such as the DaliLite and the TM-align methods, the Combinatorial Extension (CE) of the optimal path, and the FAST Align and Search Tool (FAST). Additionally, ProCKSI allows the user to upload a user-defined similarity matrix supplementing the methods mentioned, and computes a similarity consensus in order to provide a rich, integrated, multicriteria view of large datasets of protein structures.
We present ProCKSI's architecture and workflow describing its intuitive user interface, and show its potential on three distinct test-cases. In the first case, ProCKSI is used to evaluate the results of a previous CASP competition, assessing the similarity of proposed models for given targets where the structures could have a large deviation from one another. To perform this type of comparison reliably, we introduce a new consensus method. The second study deals with the verification of a classification scheme for protein kinases, originally derived by sequence comparison by Hanks and Hunter, but here we use a consensus similarity measure based on structures. In the third experiment using the Rost and Sander dataset (RS126), we investigate how a combination of different sets of similarity measures influences the quality and performance of ProCKSI's new consensus measure. ProCKSI performs well with all three datasets, showing its potential for complex, simultaneous multi-method assessment of structural similarity in large protein datasets. Furthermore, combining different similarity measures is usually more robust than relying on one single, unique measure.
Based on a diverse set of similarity measures, ProCKSI computes a consensus similarity profile for the entire protein set. All results can be clustered, visualised, analysed and easily compared with each other through a simple and intuitive interface.ProCKSI is publicly available at http://www.procksi.net for academic and non-commercial use.
我们介绍了蛋白质(结构)比较、知识、相似性与信息决策支持系统(ProCKSI)。ProCKSI通过一个易于使用的界面整合了各种蛋白质相似性度量方法,该界面允许同时比较多个蛋白质。它采用通用相似性度量(USM)、蛋白质结构的最大接触图重叠(MaxCMO)以及其他外部方法,如DaliLite和TM-align方法、最优路径的组合扩展(CE)以及快速比对与搜索工具(FAST)。此外,ProCKSI允许用户上传补充上述方法的用户定义相似性矩阵,并计算相似性共识,以便为大型蛋白质结构数据集提供丰富、综合、多标准的视图。
我们展示了ProCKSI的架构和工作流程,描述了其直观的用户界面,并在三个不同的测试用例中展示了其潜力。在第一个案例中,ProCKSI用于评估先前CASP竞赛的结果,评估针对给定目标提出的模型的相似性,其中结构可能彼此存在较大偏差。为了可靠地进行此类比较,我们引入了一种新的共识方法。第二项研究涉及对蛋白激酶分类方案的验证,该方案最初由Hanks和Hunter通过序列比较得出,但在这里我们使用基于结构的共识相似性度量。在使用Rost和Sander数据集(RS126)的第三个实验中,我们研究了不同相似性度量集的组合如何影响ProCKSI新共识度量的质量和性能。ProCKSI在所有三个数据集上表现良好,显示出其在大型蛋白质数据集中对结构相似性进行复杂、同时多方法评估的潜力。此外,组合不同的相似性度量通常比依赖单一独特度量更稳健。
基于多种相似性度量,ProCKSI为整个蛋白质集计算共识相似性概况。所有结果都可以通过一个简单直观的界面进行聚类、可视化、分析并轻松相互比较。ProCKSI可在http://www.procksi.net上公开获取,供学术和非商业使用。