Zhang Shuxing, Golbraikh Alexander, Oloff Scott, Kohn Harold, Tropsha Alexander
Division of Medicinal Chemistry and Natural Products, School of Pharmacy, CB # 7360 Beard Hall, University of North Carolina, Chapel Hill, North Carolina 27599, USA.
J Chem Inf Model. 2006 Sep-Oct;46(5):1984-95. doi: 10.1021/ci060132x.
A novel automated lazy learning quantitative structure-activity relationship (ALL-QSAR) modeling approach has been developed on the basis of the lazy learning theory. The activity of a test compound is predicted from a locally weighted linear regression model using chemical descriptors and the biological activity of the training set compounds most chemically similar to this test compound. The weights with which training set compounds are included in the regression depend on the similarity of those compounds to a test compound. We have applied the ALL-QSAR method to several experimental chemical data sets including 48 anticonvulsant agents with known ED50 values, 48 dopamine D1-receptor antagonists with known competitive binding affinities (Ki), and a Tetrahymena pyriformis data set containing 250 phenolic compounds with toxicity IGC50 values. When applied to database screening, models developed for anticonvulsant agents identified several known anticonvulsant compounds that were not only absent in the training set but highly chemically dissimilar to the training set compounds. This initial success indicates that ALL-QSAR can be further exploited as a general tool for accurate bioactivity prediction and database screening in drug design and discovery. Because of its local nature, the ALL-QSAR approach appears to be especially well-suited for the development of highly predictive models for the sparse or unevenly distributed data sets.
基于惰性学习理论,开发了一种新型的自动化惰性学习定量构效关系(ALL-QSAR)建模方法。使用化学描述符以及与该测试化合物化学性质最相似的训练集化合物的生物活性,通过局部加权线性回归模型预测测试化合物的活性。训练集化合物在回归中所包含的权重取决于这些化合物与测试化合物的相似性。我们已将ALL-QSAR方法应用于多个实验化学数据集,包括48种具有已知半数有效剂量(ED50)值的抗惊厥药、48种具有已知竞争性结合亲和力(Ki)的多巴胺D1受体拮抗剂,以及一个包含250种具有毒性半数生长抑制浓度(IGC50)值的酚类化合物的梨形四膜虫数据集。当应用于数据库筛选时,为抗惊厥药开发的模型识别出了几种已知的抗惊厥化合物,这些化合物不仅在训练集中不存在,而且与训练集化合物在化学性质上高度不同。这一初步成功表明,ALL-QSAR可进一步用作药物设计与发现中准确生物活性预测和数据库筛选的通用工具。由于其局部性质,ALL-QSAR方法似乎特别适合为稀疏或分布不均的数据集开发高度预测性模型。