Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, 291, Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea.
Catholic Precision Medicine Research Center, College of Medicine, The Catholic University of Korea, 222, Banpo-daero, Seocho-gu, Seoul, 06591, Republic of Korea.
BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):567. doi: 10.1186/s12859-017-1960-x.
The identification of target molecules is important for understanding the mechanism of "target deconvolution" in phenotypic screening and "polypharmacology" of drugs. Because conventional methods of identifying targets require time and cost, in-silico target identification has been considered an alternative solution. One of the well-known in-silico methods of identifying targets involves structure activity relationships (SARs). SARs have advantages such as low computational cost and high feasibility; however, the data dependency in the SAR approach causes imbalance of active data and ambiguity of inactive data throughout targets.
We developed a ligand-based virtual screening model comprising 1121 target SAR models built using a random forest algorithm. The performance of each target model was tested by employing the ROC curve and the mean score using an internal five-fold cross validation. Moreover, recall rates for top-k targets were calculated to assess the performance of target ranking. A benchmark model using an optimized sampling method and parameters was examined via external validation set. The result shows recall rates of 67.6% and 73.9% for top-11 (1% of the total targets) and top-33, respectively. We provide a website for users to search the top-k targets for query ligands available publicly at http://rfqsar.kaist.ac.kr .
The target models that we built can be used for both predicting the activity of ligands toward each target and ranking candidate targets for a query ligand using a unified scoring scheme. The scores are additionally fitted to the probability so that users can estimate how likely a ligand-target interaction is active. The user interface of our web site is user friendly and intuitive, offering useful information and cross references.
在表型筛选的“靶点去卷积”和药物的“多靶性”中,鉴定靶标分子非常重要。由于鉴定靶标的传统方法需要时间和成本,因此,计算靶标鉴定已被认为是一种替代方法。其中一种众所周知的计算靶标鉴定方法涉及结构活性关系(SAR)。SAR 具有计算成本低、可行性高的优点;然而,SAR 方法中的数据依赖性导致活性数据的不平衡和非活性数据在整个靶标中的不明确。
我们开发了一种基于配体的虚拟筛选模型,该模型由使用随机森林算法构建的 1121 个靶标 SAR 模型组成。通过使用内部五折交叉验证的 ROC 曲线和平均分数来测试每个靶标模型的性能。此外,还计算了 top-k 靶标的召回率,以评估靶标排序的性能。通过外部验证集检验了使用优化采样方法和参数的基准模型。结果表明,对于 top-11(总靶标数的 1%)和 top-33,分别有 67.6%和 73.9%的召回率。我们提供了一个网站,用户可以在该网站上搜索可公开获取的查询配体的 top-k 靶标,网址为 http://rfqsar.kaist.ac.kr。
我们构建的靶标模型可用于预测配体对每个靶标的活性,以及使用统一的评分方案对查询配体的候选靶标进行排名。还将评分拟合到概率中,以便用户可以估计配体-靶标相互作用的活性可能性。我们网站的用户界面用户友好且直观,提供了有用的信息和交叉引用。