Rother Kristian, Michalsky Elke, Leser Ulf
Berlin Center of Genome-Based Bioinformatics (BCB), Institute of Biochemistry at the Charité, Humboldt Universität Berlin, Berlin, Germany.
Proteins. 2005 Sep 1;60(4):571-6. doi: 10.1002/prot.20520.
We investigated to what extent Protein Data Bank (PDB) entries are annotated with second-party information based on existing cross-references between PDB and 15 other databases. We report 2 interesting findings. First, there is a clear "annotation gap" for structures less than 7 years old for secondary databases that are manually curated. Second, the examined databases overlap with each other quite well, dividing the PDB into 2 well-annotated thirds and one poorly annotated third. Both observations should be taken into account in any study depending on the selection of protein structures by their annotation.
我们基于蛋白质数据库(PDB)与其他15个数据库之间现有的交叉引用,研究了PDB条目中基于第三方信息进行注释的程度。我们报告了两个有趣的发现。第一,对于人工策划的二级数据库,7年以内的结构存在明显的“注释缺口”。第二,所研究的数据库之间重叠性良好,将PDB分为注释良好的三分之二和注释不佳的三分之一。在任何根据蛋白质结构注释进行选择的研究中,都应考虑到这两个观察结果。