Gil Víctor A, Guallar Víctor
Joint BSC-CRG-IRB Research Program in Computational Biology, Barcelona Supercomputing Center, Jordi Girona 29, 08034 Barcelona, Spain.
Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, E-08010 Barcelona, Spain.
J Chem Theory Comput. 2014 Aug 12;10(8):3236-43. doi: 10.1021/ct500306s. Epub 2014 Jul 28.
Cluster analysis is becoming a relevant tool in structural bioinformatics. It allows analyzing large conformational ensembles in order to extract features or diminish redundancy, or just as a first step for other methods. Unfortunately, the successfulness of this analysis strongly depends on the data set traits, the chosen algorithm, and its parameters, which can lead to poor or even erroneous results not easily detected. In order to overcome this problem, we have developed pyProCT, a Python open source cluster analysis toolkit specially designed to be used with ensembles of biomolecule conformations. pyProCT implements an automated protocol to choose the clustering algorithm and parameters that produce the best results for a particular data set. It offers different levels of customization according to users' expertise. Moreover, pyProCT has been designed as a collection of interchangeable libraries, making it easier to reuse it as part of other programs.
聚类分析正成为结构生物信息学中的一种重要工具。它允许分析大型构象集合,以提取特征或减少冗余,或者仅仅作为其他方法的第一步。不幸的是,这种分析的成功很大程度上取决于数据集特征、所选算法及其参数,这可能导致不易检测到的不良甚至错误结果。为了克服这个问题,我们开发了pyProCT,这是一个专门设计用于生物分子构象集合的Python开源聚类分析工具包。pyProCT实现了一个自动协议,用于选择能为特定数据集产生最佳结果的聚类算法和参数。它根据用户的专业知识提供不同程度的定制。此外,pyProCT被设计为一个可互换库的集合,使其更易于作为其他程序的一部分进行重用。