Shen Min, Béguin Cécile, Golbraikh Alexander, Stables James P, Kohn Harold, Tropsha Alexander
Division of Medicinal Chemistry and Natural Products, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7360, USA.
J Med Chem. 2004 Apr 22;47(9):2356-64. doi: 10.1021/jm030584q.
We have developed a drug discovery strategy that employs variable selection quantitative structure-activity relationship (QSAR) models for chemical database mining. The approach starts with the development of rigorously validated QSAR models obtained with the variable selection k nearest neighbor (kNN) method (or, in principle, with any other robust model-building technique). Model validation is based on several statistical criteria, including the randomization of the target property (Y-randomization), independent assessment of the training set model's predictive power using external test sets, and the establishment of the model's applicability domain. All successful models are employed in database mining concurrently; in each case, only variables selected as a result of model building (termed descriptor pharmacophore) are used in chemical similarity searches comparing active compounds of the training set (queries) with those in chemical databases. Specific biological activity (characteristic of the training set compounds) of external database entries found to be within a predefined similarity threshold of the training set molecules is predicted on the basis of the validated QSAR models using the applicability domain criteria. Compounds judged to have high predicted activities by all or the majority of all models are considered as consensus hits. We report on the application of this computational strategy for the first time for the discovery of anticonvulsant agents in the Maybridge and National Cancer Institute (NCI) databases containing ca. 250,000 compounds combined. Forty-eight anticonvulsant agents of the functionalized amino acid (FAA) series were used to build kNN variable selection QSAR models. The 10 best models were applied to mining chemical databases, and 22 compounds were selected as consensus hits. Nine compounds were synthesized and tested at the NIH Epilepsy Branch, Rockville, MD using the same biological test that was employed to assess the anticonvulsant activity of the training set compounds; of these nine, four were exact database hits and five were derived from the hits by minor chemical modifications. Seven of these nine compounds were confirmed to be active, indicating an exceptionally high hit rate. The approach described in this report can be used as a general rational drug discovery tool.
我们已经开发出一种药物发现策略,该策略采用可变选择定量构效关系(QSAR)模型进行化学数据库挖掘。该方法首先通过可变选择k近邻(kNN)方法(原则上也可使用任何其他稳健的模型构建技术)开发经过严格验证的QSAR模型。模型验证基于多个统计标准,包括目标属性的随机化(Y随机化)、使用外部测试集对训练集模型预测能力的独立评估以及模型适用域的确定。所有成功的模型同时用于数据库挖掘;在每种情况下,在化学相似性搜索中,仅使用因模型构建而选择的变量(称为描述符药效团),将训练集的活性化合物(查询物)与化学数据库中的化合物进行比较。对于在训练集分子的预定义相似性阈值内发现的外部数据库条目的特定生物活性(训练集化合物的特征),根据经过验证的QSAR模型并使用适用域标准进行预测。被所有或大多数模型判定具有高预测活性的化合物被视为共识命中物。我们首次报告了这种计算策略在包含约250,000种化合物的Maybridge和美国国立癌症研究所(NCI)数据库中发现抗惊厥剂的应用。使用48种功能化氨基酸(FAA)系列的抗惊厥剂构建kNN可变选择QSAR模型。将10个最佳模型应用于化学数据库挖掘,选择了22种化合物作为共识命中物。合成了9种化合物,并在美国国立卫生研究院(NIH)位于马里兰州罗克维尔的癫痫分支实验室使用与评估训练集化合物抗惊厥活性相同的生物学测试进行测试;在这9种化合物中,4种是数据库中的精确命中物,5种是通过对命中物进行微小化学修饰得到的。这9种化合物中有7种被证实具有活性,表明命中率极高。本报告中描述的方法可作为一种通用的合理药物发现工具。