Wang Zhonghua, Liang Lu, Yin Zheng, Lin Jianping
State Key Laboratory of Medicinal Chemical Biology and College of Pharmacy, Nankai University, Weijin Road, Tianjin, China.
State Key Laboratory of Medicinal Chemical Biology and College of Pharmacy, Nankai University, Weijin Road, Tianjin, China ; High-Throughput Molecular Drug Discovery Center, Tianjin Joint Academy of Biomedicine and Technology, Tianjin, China.
J Cheminform. 2016 Apr 23;8:20. doi: 10.1186/s13321-016-0130-x. eCollection 2016.
In silico target prediction of compounds plays an important role in drug discovery. The chemical similarity ensemble approach (SEA) is a promising method, which has been successfully applied in many drug-related studies. There are various models available analogous to SEA, because this approach is based on different types of molecular fingerprints. To investigate the influence of training data selection and the complementarity of different models, several SEA models were constructed and tested.
When we used a test set of 37,138 positive and 42,928 negative ligand-target interactions, among the five tested molecular fingerprint methods, at significance level 0.05, Topological-based model yielded the best precision rate (83.7 %) and [Formula: see text] (0.784) while Atom pair-based model yielded the best [Formula: see text] (0.694). By employing an election system to combine the five models, a flexible prediction scheme was achieved with precision range from 71 to 90.6 %, [Formula: see text] range from 0.663 to 0.684 and [Formula: see text] range from 0.696 to 0.817.
The overall effectiveness of all of the five models could be ranked in decreasing order as follows: Atom pair [Formula: see text] Topological > Morgan > MACCS > Pharmacophore. Combining multiple SEA models, which takes advantages of different models, could be used to improve the success rates of the models. Another possibility of improving the model could be using target-specific classes or more active compounds.
化合物的计算机辅助靶点预测在药物发现中起着重要作用。化学相似性集成方法(SEA)是一种很有前景的方法,已成功应用于许多与药物相关的研究中。由于该方法基于不同类型的分子指纹,因此有各种类似于SEA的模型。为了研究训练数据选择的影响以及不同模型的互补性,构建并测试了几种SEA模型。
当我们使用包含37138个正配体-靶点相互作用和42928个负配体-靶点相互作用的测试集时,在五种测试的分子指纹方法中,在显著性水平为0.05时,基于拓扑结构的模型产生了最佳的精确率(83.7%)和马修斯相关系数(0.784),而基于原子对的模型产生了最佳的马修斯相关系数(0.694)。通过采用一种选择系统来组合这五个模型,实现了一种灵活的预测方案,精确率范围为71%至90.6%,马修斯相关系数范围为0.663至0.684,F1值范围为0.696至0.817。
所有五个模型的整体有效性按降序排列如下:原子对>拓扑结构>Morgan>MACCS>药效团。结合多个SEA模型,利用不同模型的优势,可以提高模型的成功率。改进模型的另一种可能性是使用靶点特异性类别或更多活性化合物。