Cao Yiqun, Jiang Tao, Girke Thomas
Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA.
Bioinformatics. 2008 Jul 1;24(13):i366-74. doi: 10.1093/bioinformatics/btn186.
The prediction of biologically active compounds is of great importance for high-throughput screening (HTS) approaches in drug discovery and chemical genomics. Many computational methods in this area focus on measuring the structural similarities between chemical structures. However, traditional similarity measures are often too rigid or consider only global similarities between structures. The maximum common substructure (MCS) approach provides a more promising and flexible alternative for predicting bioactive compounds.
In this article, a new backtracking algorithm for MCS is proposed and compared to global similarity measurements. Our algorithm provides high flexibility in the matching process, and it is very efficient in identifying local structural similarities. To predict and cluster biologically active compounds more efficiently, the concept of basis compounds is proposed that enables researchers to easily combine the MCS-based and traditional similarity measures with modern machine learning techniques. Support vector machines (SVMs) are used to test how the MCS-based similarity measure and the basis compound vectorization method perform on two empirically tested datasets. The test results show that MCS complements the well-known atom pair descriptor-based similarity measure. By combining these two measures, our SVM-based model predicts the biological activities of chemical compounds with higher specificity and sensitivity.
Supplementary data are available at Bioinformatics online.
对于药物发现和化学基因组学中的高通量筛选(HTS)方法而言,预测生物活性化合物至关重要。该领域的许多计算方法都侧重于测量化学结构之间的结构相似性。然而,传统的相似性度量往往过于严格,或者仅考虑结构之间的全局相似性。最大公共子结构(MCS)方法为预测生物活性化合物提供了一种更有前景且灵活的替代方法。
本文提出了一种用于MCS的新回溯算法,并将其与全局相似性度量进行比较。我们的算法在匹配过程中提供了高度的灵活性,并且在识别局部结构相似性方面非常高效。为了更有效地预测和聚类生物活性化合物,提出了基础化合物的概念,使研究人员能够轻松地将基于MCS的和传统的相似性度量与现代机器学习技术相结合。支持向量机(SVM)用于测试基于MCS的相似性度量和基础化合物矢量化方法在两个经过实证测试的数据集上的表现。测试结果表明,MCS补充了基于原子对描述符的著名相似性度量。通过结合这两种度量,我们基于SVM的模型以更高的特异性和敏感性预测化合物的生物活性。
补充数据可在《生物信息学》在线获取。