Lane Center for Computational Biology, Carnegie Mellon University, 5000 Forbes Ave,, Pittsburgh, PA 15213, USA.
BMC Bioinformatics. 2014 May 16;15:143. doi: 10.1186/1471-2105-15-143.
Drug discovery and development has been aided by high throughput screening methods that detect compound effects on a single target. However, when using focused initial screening, undesirable secondary effects are often detected late in the development process after significant investment has been made. An alternative approach would be to screen against undesired effects early in the process, but the number of possible secondary targets makes this prohibitively expensive.
This paper describes methods for making this global approach practical by constructing predictive models for many target responses to many compounds and using them to guide experimentation. We demonstrate for the first time that by jointly modeling targets and compounds using descriptive features and using active machine learning methods, accurate models can be built by doing only a small fraction of possible experiments. The methods were evaluated by computational experiments using a dataset of 177 assays and 20,000 compounds constructed from the PubChem database.
An average of nearly 60% of all hits in the dataset were found after exploring only 3% of the experimental space which suggests that active learning can be used to enable more complete characterization of compound effects than otherwise affordable. The methods described are also likely to find widespread application outside drug discovery, such as for characterizing the effects of a large number of compounds or inhibitory RNAs on a large number of cell or tissue phenotypes.
高通量筛选方法有助于药物发现和开发,这些方法可以检测化合物对单一靶标的影响。然而,在使用有针对性的初始筛选时,在已经进行了大量投资之后,往往会在开发过程的后期才发现不理想的次要影响。另一种方法是在早期的过程中针对不良影响进行筛选,但由于可能的次要靶标数量众多,这在经济上是不可行的。
本文描述了通过构建对许多化合物对许多靶标反应的预测模型并使用它们来指导实验,从而使这种全局方法变得可行的方法。我们首次证明,通过使用描述性特征联合建模靶标和化合物,并使用主动机器学习方法,仅通过一小部分可能的实验就可以构建准确的模型。该方法通过使用从 PubChem 数据库构建的 177 个测定和 20000 个化合物的数据集进行计算实验进行了评估。
在仅探索实验空间的 3%的情况下,数据集的近 60%的命中被发现,这表明主动学习可用于实现比其他方法更全面的化合物作用特征。所描述的方法也可能在药物发现之外得到广泛应用,例如对大量化合物或抑制性 RNA 对大量细胞或组织表型的作用进行特征描述。