Department of Chemistry, McGill University, Montreal, Quebec H3A 0B8, Canada.
J Chem Inf Model. 2024 Jul 22;64(14):5617-5623. doi: 10.1021/acs.jcim.4c00461. Epub 2024 Jul 9.
The design of biosequences for biosensing and therapeutics is a challenging multistep search and optimization task. In principle, computational modeling may speed up the design process by virtual screening of sequences based on their binding affinities to target molecules. However, in practice, existing machine-learned models trained to predict binding affinities lack the flexibility with respect to reaction conditions, and molecular dynamics simulations that can incorporate reaction conditions suffer from high computational costs. Here, we describe a computational approach called DeltaGzip that evaluates the free energy of binding in biopolymer-ligand complexes from ultrashort equilibrium molecular dynamics simulations. The entropy of binding is evaluated using the Kolmogorov complexity definition of entropy and approximated using a lossless compression algorithm, Gzip. We benchmark the method on a well-studied data set of protein-ligand complexes comparing the predictions of DeltaGzip to the free energies of binding obtained using Jarzynski equality and experimental measurements.
生物序列的设计用于生物传感和治疗是一项具有挑战性的多步骤搜索和优化任务。原则上,计算建模可以通过基于与靶分子结合亲和力的序列虚拟筛选来加速设计过程。然而,在实践中,为预测结合亲和力而训练的现有机器学习模型缺乏对反应条件的灵活性,而可以纳入反应条件的分子动力学模拟则受到高计算成本的限制。在这里,我们描述了一种称为 DeltaGzip 的计算方法,该方法可从超短平衡分子动力学模拟中评估生物聚合物-配体复合物的结合自由能。结合熵使用 Kolmogorov 复杂度定义的熵进行评估,并使用无损压缩算法 Gzip 进行近似。我们在一个经过充分研究的蛋白质-配体复合物数据集上对该方法进行了基准测试,将 DeltaGzip 的预测与使用 Jarzynski 等式和实验测量获得的结合自由能进行了比较。