Szabadka Zoltán, Grolmusz Vince
Dept. of Comput. Sci., Eötvös Univ., Budapest, Hungary.
Conf Proc IEEE Eng Med Biol Soc. 2006;2006:5755-8. doi: 10.1109/IEMBS.2006.259331.
A method for automatically analyzing structures deposited in the Protein Data Bank is presented. The method is capable to detect missing atoms, bond length deviations, atom bumps and to correctly identify protein-ligand complexes. The results are organized into a database, called the Rich Structure PDB (RS-PDB in short) from which one can easily select PDB entries satisfying diverse sets of requirements. The newer and richer mmCIF format of both the PDB and its chemical component dictionary (formerly the HET Group Dictionary) were used in the construction, and the International Chemical Identifier (InChI) of IUPAC played a main role in correctly identifying distinct ligands.
本文提出了一种自动分析蛋白质数据库(Protein Data Bank)中所存结构的方法。该方法能够检测缺失原子、键长偏差、原子碰撞,并能正确识别蛋白质-配体复合物。结果被整理到一个名为“丰富结构蛋白质数据库(Rich Structure PDB,简称RS-PDB)”的数据库中,从中可以轻松选择满足各种不同要求的蛋白质数据库条目。在构建过程中使用了蛋白质数据库及其化学成分字典(以前的HET基团字典)更新且更丰富的mmCIF格式,国际纯粹与应用化学联合会(IUPAC)的国际化学标识符(InChI)在正确识别不同配体方面发挥了主要作用。