Wishart David S, Arndt David, Berjanskii Mark, Guo An Chi, Shi Yi, Shrivastava Savita, Zhou Jianjun, Zhou You, Lin Guohui
Department of Biological Sciences, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.
Nucleic Acids Res. 2008 Jan;36(Database issue):D222-9. doi: 10.1093/nar/gkm800. Epub 2007 Oct 4.
The protein property prediction and testing database (PPT-DB) is a database housing nearly 30 carefully curated databases, each of which contains commonly predicted protein property information. These properties include both structural (i.e. secondary structure, contact order, disulfide pairing) and dynamic (i.e. order parameters, B-factors, folding rates) features that have been measured, derived or tabulated from a variety of sources. PPT-DB is designed to serve two purposes. First it is intended to serve as a centralized, up-to-date, freely downloadable and easily queried repository of predictable or 'derived' protein property data. In this role, PPT-DB can serve as a one-stop, fully standardized repository for developers to obtain the required training, testing and validation data needed for almost any kind of protein property prediction program they may wish to create. The second role that PPT-DB can play is as a tool for homology-based protein property prediction. Users may query PPT-DB with a sequence of interest and have a specific property predicted using a sequence similarity search against PPT-DB's extensive collection of proteins with known properties. PPT-DB exploits the well-known fact that protein structure and dynamic properties are highly conserved between homologous proteins. Predictions derived from PPT-DB's similarity searches are typically 85-95% correct (for categorical predictions, such as secondary structure) or exhibit correlations of >0.80 (for numeric predictions, such as accessible surface area). This performance is 10-20% better than what is typically obtained from standard 'ab initio' predictions. PPT-DB, its prediction utilities and all of its contents are available at http://www.pptdb.ca.
蛋白质特性预测与测试数据库(PPT-DB)是一个容纳近30个精心策划数据库的数据库,每个数据库都包含常见的预测蛋白质特性信息。这些特性包括已从各种来源测量、推导或制表的结构特征(即二级结构、接触序、二硫键配对)和动态特征(即序参量、B因子、折叠速率)。PPT-DB旨在实现两个目的。首先,它旨在作为一个集中的、最新的、可免费下载且易于查询的可预测或“推导”蛋白质特性数据存储库。在这个角色中,PPT-DB可以作为一个一站式的、完全标准化的存储库,供开发人员获取他们可能希望创建的几乎任何类型蛋白质特性预测程序所需的训练、测试和验证数据。PPT-DB可以发挥的第二个作用是作为基于同源性的蛋白质特性预测工具。用户可以使用感兴趣的序列查询PPT-DB,并通过对PPT-DB中大量具有已知特性的蛋白质进行序列相似性搜索来预测特定特性。PPT-DB利用了同源蛋白质之间蛋白质结构和动态特性高度保守这一众所周知的事实。从PPT-DB的相似性搜索得出的预测通常有85%-95%是正确的(对于分类预测,如二级结构),或者相关性大于0.80(对于数值预测,如可及表面积)。这种性能比从标准的“从头开始”预测通常获得的性能要好10%-20%。PPT-DB、其预测实用程序及其所有内容可在http://www.pptdb.ca上获取。