Department of Integrative Structural and Computational Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 93037, United States.
Koliber Biosciences, Inc., 12265 World Trade Drive, Suite G, San Diego, California 92128, United States.
J Chem Inf Model. 2021 Jun 28;61(6):3074-3090. doi: 10.1021/acs.jcim.1c00573. Epub 2021 Jun 14.
In recent years, therapeutic peptides have gained a lot interest as demonstrated by the 60 peptides approved as drugs in major markets and 150+ peptides currently in clinical trials. However, while small molecule docking is routinely used in rational drug design efforts, docking peptides has proven challenging partly because docking scoring functions, developed and calibrated for small molecules, perform poorly for these molecules. Here, we present random forest classifiers trained to discriminate correctly docked peptides. We show that, for a testing set of 47 protein-peptide complexes, structurally dissimilar from the training set and previously used to benchmark AutoDock Vina's ability to dock short peptides, these random forest classifiers improve docking power from ∼25% for AutoDock scoring functions to an average of ∼70%. These results pave the way for peptide-docking success rates comparable to those of small molecule docking. To develop these classifiers, we compiled the ProptPep37_2021 data set, a curated, high-quality set of 322 crystallographic protein-peptides complexes annotated with structural similarity information. The data set also provides a collection of high-quality putative poses with a range of deviations from the crystallographic pose, providing correct and incorrect poses (i.e., decoys) of the peptide for each entry. The ProptPep37_2021 data set as well as the classifiers presented here are freely available.
近年来,治疗性肽已经引起了广泛关注,这可以从主要市场批准的 60 种肽类药物和 150 多种正在临床试验中的肽类药物中得到证明。然而,虽然小分子对接在合理药物设计中得到了广泛应用,但对接肽类药物却证明是具有挑战性的,部分原因是针对小分子开发和校准的对接评分函数在这些分子上的表现不佳。在这里,我们提出了经过训练可以正确区分对接肽类的随机森林分类器。我们发现,对于一组 47 个与训练集结构不同的蛋白-肽复合物的测试集,并且之前曾用于基准 AutoDock Vina 对接短肽的能力,这些随机森林分类器将对接成功率从 AutoDock 评分函数的约 25%提高到平均约 70%。这些结果为肽类对接成功率与小分子对接成功率相媲美铺平了道路。为了开发这些分类器,我们编译了 ProptPep37_2021 数据集,这是一个经过精心整理的高质量数据集,其中包含 322 个晶体结构的蛋白-肽复合物,并附有结构相似性信息的注释。该数据集还提供了一系列高质量的假定构象,这些构象与晶体构象存在一定偏差,为每个条目提供了肽的正确和错误构象(即诱饵)。ProptPep37_2021 数据集以及这里提出的分类器是免费提供的。