Computer-Assisted Drug Design, Institute of Pharmaceutical Sciences, Department of Chemistry & Applied Biosciences, Swiss Federal Institute of Technology (ETH Zurich), Vladimir-Prelog-Weg 1-5/10, 8093 Zurich, Switzerland.
Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, 500 Main St, Cambridge, MA 02139, USA.
Future Med Chem. 2017 Mar;9(4):381-402. doi: 10.4155/fmc-2016-0197. Epub 2017 Mar 6.
Computational chemogenomics models the compound-protein interaction space, typically for drug discovery, where existing methods predominantly either incorporate increasing numbers of bioactivity samples or focus on specific subfamilies of proteins and ligands. As an alternative to modeling entire large datasets at once, active learning adaptively incorporates a minimum of informative examples for modeling, yielding compact but high quality models. Results/methodology: We assessed active learning for protein/target family-wide chemogenomic modeling by replicate experiment. Results demonstrate that small yet highly predictive models can be extracted from only 10-25% of large bioactivity datasets, irrespective of molecule descriptors used.
Chemogenomic active learning identifies small subsets of ligand-target interactions in a large screening database that lead to knowledge discovery and highly predictive models.
计算化学基因组学模型化合物-蛋白质相互作用空间,通常用于药物发现,现有方法主要要么纳入越来越多的生物活性样本,要么专注于蛋白质和配体的特定亚家族。作为一次对整个大型数据集进行建模的替代方法,主动学习自适应地纳入最少的信息示例进行建模,从而生成紧凑但高质量的模型。结果/方法:我们通过重复实验评估了蛋白质/靶标家族范围的化学基因组学建模的主动学习。结果表明,无论使用何种分子描述符,都可以从大型生物活性数据集中仅提取 10-25%的小而高度可预测的模型。结论:化学基因组学主动学习可以从大型筛选数据库中识别出导致知识发现和高度可预测模型的小的配体-靶标相互作用子集。