Casbon James A, Crooks Gavin E, Saqi Mansoor A S
Bioinformatics, Institute of Cell and Molecular Science, School of Medicine and Dentistry, Queen Mary, University of London, London EC1 6BQ, UK.
BMC Bioinformatics. 2006 Jan 10;7:10. doi: 10.1186/1471-2105-7-10.
Benchmarking algorithms in structural bioinformatics often involves the construction of datasets of proteins with given sequence and structural properties. The SCOP database is a manually curated structural classification which groups together proteins on the basis of structural similarity. The ASTRAL compendium provides non redundant subsets of SCOP domains on the basis of sequence similarity such that no two domains in a given subset share more than a defined degree of sequence similarity. Taken together these two resources provide a 'ground truth' for assessing structural bioinformatics algorithms. We present a small and easy to use API written in python to enable construction of datasets from these resources.
We have designed a set of python modules to provide an abstraction of the SCOP and ASTRAL databases. The modules are designed to work as part of the Biopython distribution. Python users can now manipulate and use the SCOP hierarchy from within python programs, and use ASTRAL to return sequences of domains in SCOP, as well as clustered representations of SCOP from ASTRAL.
The modules make the analysis and generation of datasets for use in structural genomics easier and more principled.
结构生物信息学中的基准测试算法通常涉及构建具有给定序列和结构特性的蛋白质数据集。SCOP数据库是一个经过人工整理的结构分类,它根据结构相似性将蛋白质归为一组。ASTRAL汇编基于序列相似性提供SCOP结构域的非冗余子集,使得给定子集中的任意两个结构域的序列相似性不超过定义的程度。这两种资源共同为评估结构生物信息学算法提供了“基本事实”。我们展示了一个用Python编写的小型且易于使用的应用程序编程接口,以实现从这些资源构建数据集。
我们设计了一组Python模块,以提供SCOP和ASTRAL数据库的抽象。这些模块被设计为作为Biopython发行版的一部分工作。Python用户现在可以在Python程序中操作和使用SCOP层次结构,并使用ASTRAL返回SCOP中结构域的序列,以及来自ASTRAL的SCOP聚类表示。
这些模块使结构基因组学中数据集的分析和生成更加容易且更具原则性。