Hooft R W, Sander C, Scharf M, Vriend G
Comput Appl Biosci. 1996 Dec;12(6):525-9. doi: 10.1093/bioinformatics/12.6.525.
The Protein Data Bank currently contains more than 4700 protein coordinate sets. It is often desirable to make a selection from these files based on a criterion like R-factor, experimental method, length of the amino acid sequence, or the number of homologous sequences in SWISSPROT. Doing this using the distributed form of the Protein Data Bank can be a tedious task, because (1) this requires reading one file for every single entry, and (2) not all of the information is present in a consistent computer readable way in all of the entries.
The PDBFINDER database provides an easy to interpret file containing summary information about all Protein Data Bank files. Summary information from the DSSP (Definition of Secondary Structure of Proteins) and HSSP (Homology derived Secondary Structure of Proteins) databases is also included. Furthermore, where essential data were missing from the Protein Data Bank file, this information has been retrieved from the original literature.
The latest version of the PDBFINDER database can be downloaded by anonymous ftp from swift.embl-heidelberg.de, directory:/pdbfinder.
E-mail address hooft@embl-heidelberg.de.
蛋白质数据库(Protein Data Bank)目前包含4700多个蛋白质坐标集。通常希望根据诸如R因子、实验方法、氨基酸序列长度或SWISSPROT中同源序列数量等标准从这些文件中进行选择。使用蛋白质数据库的分布式形式来做这件事可能是一项繁琐的任务,因为(1)这需要为每个单独的条目读取一个文件,并且(2)并非所有信息在所有条目中都以一致的计算机可读方式呈现。
PDBFINDER数据库提供了一个易于解释的文件,其中包含有关所有蛋白质数据库文件的摘要信息。还包括来自DSSP(蛋白质二级结构定义)和HSSP(蛋白质同源衍生二级结构)数据库的摘要信息。此外,当蛋白质数据库文件中缺少关键数据时,这些信息已从原始文献中获取。
PDBFINDER数据库的最新版本可通过匿名ftp从swift.embl - heidelberg.de下载,目录为:/pdbfinder。
电子邮件地址hooft@embl - heidelberg.de。