通过支持向量机从大型化合物库中鉴定小分子聚集物。

Identification of small molecule aggregators from large compound libraries by support vector machines.

机构信息

College of Chemistry, Sichuan University, Chengdu 610064, People's Republic of China.

出版信息

J Comput Chem. 2010 Mar;31(4):752-63. doi: 10.1002/jcc.21347.

PMID:19569201

Abstract

Small molecule aggregators non-specifically inhibit multiple unrelated proteins, rendering them therapeutically useless. They frequently appear as false hits and thus need to be eliminated in high-throughput screening campaigns. Computational methods have been explored for identifying aggregators, which have not been tested in screening large compound libraries. We used 1319 aggregators and 128,325 non-aggregators to develop a support vector machines (SVM) aggregator identification model, which was tested by four methods. The first is five fold cross-validation, which showed comparable aggregator and significantly improved non-aggregator identification rates against earlier studies. The second is the independent test of 17 aggregators discovered independently from the training aggregators, 71% of which were correctly identified. The third is retrospective screening of 13M PUBCHEM and 168K MDDR compounds, which predicted 97.9% and 98.7% of the PUBCHEM and MDDR compounds as non-aggregators. The fourth is retrospective screening of 5527 MDDR compounds similar to the known aggregators, 1.14% of which were predicted as aggregators. SVM showed slightly better overall performance against two other machine learning methods based on five fold cross-validation studies of the same settings. Molecular features of aggregation, extracted by a feature selection method, are consistent with published profiles. SVM showed substantial capability in identifying aggregators from large libraries at low false-hit rates.

摘要

小分子聚集物非特异性地抑制多种不相关的蛋白质，使其在治疗上变得无用。它们经常作为假阳性出现，因此需要在高通量筛选中消除。已经探索了计算方法来识别聚集物，但这些方法尚未在筛选大型化合物库中进行测试。我们使用了 1319 种聚集物和 128325 种非聚集物来开发支持向量机（SVM）聚集物识别模型，该模型通过四种方法进行了测试。第一种是五重交叉验证，与早期研究相比，该方法显示出可比的聚集物和显著提高的非聚集物识别率。第二种是对从训练聚集物中独立发现的 17 种聚集物的独立测试，其中 71%被正确识别。第三种是对 13M PUBCHEM 和 168K MDDR 化合物的回溯筛选，预测了 97.9%和 98.7%的 PUBCHEM 和 MDDR 化合物是非聚集物。第四种是对与已知聚集物相似的 5527 种 MDDR 化合物的回溯筛选，其中 1.14%被预测为聚集物。SVM 在五重交叉验证研究中对两种其他机器学习方法的整体性能略好，这些研究具有相同的设置。通过特征选择方法提取的聚集分子特征与已发表的特征一致。SVM 在以低假阳性率从大型文库中识别聚集物方面具有很强的能力。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过支持向量机从大型化合物库中鉴定小分子聚集物。

Identification of small molecule aggregators from large compound libraries by support vector machines.

机构信息

出版信息

相似文献

引用本文的文献

通过支持向量机从大型化合物库中鉴定小分子聚集物。

Identification of small molecule aggregators from large compound libraries by support vector machines.

机构信息

出版信息

相似文献

引用本文的文献