Department of Biophysics and Biochemistry, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan.
J Chem Inf Model. 2010 Jan;50(1):170-85. doi: 10.1021/ci900382e.
In this study, we developed a new pharmacophore-based interaction fingerprint (Pharm-IF) and examined its usefulness for in silico screening using machine learning techniques such as support vector machine (SVM) and random forest (RF) instead of similarity-based ranking. Using the docking results of PKA, SRC, cathepsin K, carbonic anhydrase II, and HIV-1 protease, the screening efficiencies of the Pharm-IF models were compared to GLIDE score and the residue-based IF (PLIF) models. The combination of SVM and Pharm-IF demonstrated a higher enrichment factor at 10% (5.7 on average) than those of GLIDE score (4.2) and PLIF (4.3). In terms of the size of the training sets, learning more than five crystal structures enabled the machine learning models to stably achieve better efficiencies than GLIDE score. We also employed the docking poses of known active compounds, in addition to the crystal structures, as positive samples of training sets. The enrichment factors of the RF models at 10% using the docking poses for SRC and cathepsin K showed significantly higher values (6.5 and 6.3) than those using only the crystal structures (3.9 and 3.2), respectively.
在这项研究中,我们开发了一种新的基于药效团的相互作用指纹(Pharm-IF),并使用支持向量机(SVM)和随机森林(RF)等机器学习技术,而不是基于相似性的排序,来检验其在计算机筛选中的有效性。使用 PKA、SRC、组织蛋白酶 K、碳酸酐酶 II 和 HIV-1 蛋白酶的对接结果,比较了 Pharm-IF 模型的筛选效率与 GLIDE 评分和基于残基的 IF(PLIF)模型。SVM 和 Pharm-IF 的组合在 10%时表现出更高的富集因子(平均为 5.7),而 GLIDE 评分(4.2)和 PLIF(4.3)则较低。就训练集的大小而言,学习超过五个晶体结构使机器学习模型能够稳定地实现比 GLIDE 评分更好的效率。我们还将已知活性化合物的对接构象,除了晶体结构之外,用作训练集的阳性样本。使用 SRC 和组织蛋白酶 K 的对接构象作为训练集的 RF 模型在 10%时的富集因子分别为 6.5 和 6.3,明显高于仅使用晶体结构时的 3.9 和 3.2。