Boik John C, Newman Robert A
Department of Experimental Therapeutics, University of Texas M, D, Anderson Cancer Center, 8000 El Rio, Houston, TX 77054, USA.
BMC Pharmacol. 2008 Jun 13;8:12. doi: 10.1186/1471-2210-8-12.
Quantitative structure-activity relationship (QSAR) models have become popular tools to help identify promising lead compounds in anticancer drug development. Few QSAR studies have investigated multitask learning, however. Multitask learning is an approach that allows distinct but related data sets to be used in training. In this paper, a suite of three QSAR models is developed to identify compounds that are likely to (a) exhibit cytotoxic behavior against cancer cells, (b) exhibit high rat LD50 values (low systemic toxicity), and (c) exhibit low to modest human oral clearance (favorable pharmacokinetic characteristics). Models were constructed using Kernel Multitask Latent Analysis (KMLA), an approach that can effectively handle a large number of correlated data features, nonlinear relationships between features and responses, and multitask learning. Multitask learning is particularly useful when the number of available training records is small relative to the number of features, as was the case with the oral clearance data.
Multitask learning modestly but significantly improved the classification precision for the oral clearance model. For the cytotoxicity model, which was constructed using a large number of records, multitask learning did not affect precision but did reduce computation time. The models developed here were used to predict activities for 115,000 natural compounds. Hundreds of natural compounds, particularly in the anthraquinone and flavonoids groups, were predicted to be cytotoxic, have high LD50 values, and have low to moderate oral clearance.
Multitask learning can be useful in some QSAR models. A suite of QSAR models was constructed and used to screen a large drug library for compounds likely to be cytotoxic to multiple cancer cell lines in vitro, have low systemic toxicity in rats, and have favorable pharmacokinetic properties in humans.
定量构效关系(QSAR)模型已成为抗癌药物研发中帮助识别有潜力先导化合物的常用工具。然而,很少有QSAR研究探讨多任务学习。多任务学习是一种允许在训练中使用不同但相关数据集的方法。本文开发了一组三个QSAR模型,以识别可能(a)对癌细胞表现出细胞毒性行为、(b)具有高大鼠半数致死剂量值(低全身毒性)以及(c)具有低至中等人体口服清除率(良好的药代动力学特征)的化合物。使用核多任务潜在分析(KMLA)构建模型,该方法能够有效处理大量相关数据特征、特征与响应之间的非线性关系以及多任务学习。当可用训练记录的数量相对于特征数量较少时,如口服清除率数据的情况,多任务学习特别有用。
多任务学习适度但显著提高了口服清除率模型的分类精度。对于使用大量记录构建的细胞毒性模型,多任务学习不影响精度,但确实减少了计算时间。这里开发的模型用于预测115,000种天然化合物的活性。数百种天然化合物,特别是蒽醌类和黄酮类化合物,预计具有细胞毒性、高LD50值以及低至中等的口服清除率。
多任务学习在某些QSAR模型中可能有用。构建了一组QSAR模型,并用于在一个大型药物库中筛选可能对多种体外癌细胞系具有细胞毒性、在大鼠中具有低全身毒性且在人体中具有良好药代动力学特性的化合物。