Terwilliger Thomas C, Adams Paul D, Moriarty Nigel W, Cohn Judith D
Los Alamos National Laboratory, Mailstop M888, Los Alamos, NM 87545, USA.
Acta Crystallogr D Biol Crystallogr. 2007 Jan;63(Pt 1):101-7. doi: 10.1107/S0907444906046233. Epub 2006 Dec 13.
A procedure for the identification of ligands bound in crystal structures of macromolecules is described. Two characteristics of the density corresponding to a ligand are used in the identification procedure. One is the correlation of the ligand density with each of a set of test ligands after optimization of the fit of that ligand to the density. The other is the correlation of a fingerprint of the density with the fingerprint of model density for each possible ligand. The fingerprints consist of an ordered list of correlations of each the test ligands with the density. The two characteristics are scored using a Z-score approach in which the correlations are normalized to the mean and standard deviation of correlations found for a variety of mismatched ligand-density pairs, so that the Z scores are related to the probability of observing a particular value of the correlation by chance. The procedure was tested with a set of 200 of the most commonly found ligands in the Protein Data Bank, collectively representing 57% of all ligands in the Protein Data Bank. Using a combination of these two characteristics of ligand density, ranked lists of ligand identifications were made for representative (F(o) - F(c))exp(i(phi)c) difference density from entries in the Protein Data Bank. In 48% of the 200 cases, the correct ligand was at the top of the ranked list of ligands. This approach may be useful in identification of unknown ligands in new macromolecular structures as well as in the identification of which ligands in a mixture have bound to a macromolecule.
本文描述了一种识别与大分子晶体结构结合的配体的方法。在识别过程中使用了与配体相对应的密度的两个特征。一个是在将配体与密度进行拟合优化后,配体密度与一组测试配体中每一个的相关性。另一个是密度指纹与每个可能配体的模型密度指纹的相关性。指纹由每个测试配体与密度的相关性的有序列表组成。使用Z分数方法对这两个特征进行评分,其中相关性被归一化为从各种不匹配的配体-密度对中发现的相关性的均值和标准差,这样Z分数就与偶然观察到特定相关性值的概率相关。该方法用蛋白质数据库中200种最常见的配体进行了测试,这些配体共同代表了蛋白质数据库中所有配体的57%。利用配体密度的这两个特征的组合,针对来自蛋白质数据库条目的代表性(F(o)-F(c))exp(i(phi)c)差值密度,生成了配体识别的排名列表。在200个案例中的48%中,正确的配体在配体排名列表的首位。这种方法可能有助于识别新的大分子结构中的未知配体,以及识别混合物中哪些配体与大分子结合。