Department of Electrical and Computer Engineering, Michigan State University , East Lansing, Michigan 48824-1226, United States.
J Chem Inf Model. 2018 Jan 22;58(1):134-147. doi: 10.1021/acs.jcim.7b00310. Epub 2017 Dec 18.
Protein-ligand (PL) interactions play a key role in many life processes such as molecular recognition, molecular binding, signal transmission, and cell metabolism. Examples of interaction forces include hydrogen bonding, hydrophobic effects, steric clashes, electrostatic contacts, and van der Waals attractions. Currently, a large number of hypotheses and perspectives to model these interaction forces are scattered throughout the literature and largely forgotten. Instead, had they been assembled and utilized collectively, they would have substantially improved the accuracy of predicting binding affinity of protein-ligand complexes. In this work, we present Descriptor Data Bank (DDB), a data-driven platform on the cloud for facilitating multiperspective modeling of PL interactions. DDB is an open-access hub for depositing, hosting, executing, and sharing descriptor extraction tools and data for a large number of interaction modeling hypotheses. The platform also implements a machine-learning (ML) toolbox for automatic descriptor filtering and analysis and scoring function (SF) fitting and prediction. The descriptor filtering module is used to filter out irrelevant and/or noisy descriptors and to produce a compact subset from all available features. We seed DDB with 16 diverse descriptor extraction tools developed in-house and collected from the literature. The tools altogether generate over 2700 descriptors that characterize (i) proteins, (ii) ligands, and (iii) protein-ligand complexes. The in-house descriptors we extract are protein-specific which are based on pairwise primary and tertiary alignment of protein structures followed by clustering and trilateration. We built and used DDB's ML library to fit SFs to the in-house descriptors and those collected from the literature. We then evaluated them on several data sets that were constructed to reflect real-world drug screening scenarios. We found that multiperspective SFs that were constructed using a large number of diverse DDB descriptors capturing various PL interactions in different ways outperformed their single-perspective counterparts in all evaluation scenarios, with an average improvement of more than 15%. We also found that our proposed protein-specific descriptors improve the accuracy of SFs.
蛋白质-配体 (PL) 相互作用在许多生命过程中起着关键作用,例如分子识别、分子结合、信号传递和细胞代谢。相互作用力的示例包括氢键、疏水作用、立体冲突、静电接触和范德华吸引力。目前,大量用于模拟这些相互作用力的假设和观点分散在文献中,并且很大程度上被遗忘了。相反,如果将它们集中起来并共同利用,它们将大大提高预测蛋白质-配体复合物结合亲和力的准确性。在这项工作中,我们提出了描述符数据库 (DDB),这是一个基于云的数据驱动平台,用于促进 PL 相互作用的多角度建模。DDB 是一个开放访问的中心,用于存储、托管、执行和共享大量相互作用建模假设的描述符提取工具和数据。该平台还实现了一个机器学习 (ML) 工具箱,用于自动描述符筛选和分析以及评分函数 (SF) 拟合和预测。描述符筛选模块用于筛选出不相关和/或嘈杂的描述符,并从所有可用特征中生成紧凑的子集。我们在 DDB 中植入了 16 种内部开发和从文献中收集的不同描述符提取工具。这些工具总共生成了超过 2700 个描述符,用于描述 (i) 蛋白质、(ii) 配体和 (iii) 蛋白质-配体复合物。我们提取的内部描述符是基于蛋白质结构的两两原始和三级比对,然后进行聚类和三边测量的蛋白质特异性描述符。我们构建并使用 DDB 的 ML 库来拟合 SF 到内部描述符和从文献中收集的描述符。然后,我们在几个数据集上评估它们,这些数据集构建旨在反映真实世界的药物筛选场景。我们发现,使用大量不同的 DDB 描述符构建的多角度 SF,这些描述符以不同的方式捕捉各种 PL 相互作用,在所有评估场景中的表现都优于其单视角对应物,平均提高了 15%以上。我们还发现,我们提出的蛋白质特异性描述符提高了 SF 的准确性。