Murrell Daniel S, Cortes-Ciriano Isidro, van Westen Gerard J P, Stott Ian P, Bender Andreas, Malliavin Thérèse E, Glen Robert C
Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK.
Unite de Bioinformatique Structurale, Structural Biology and Chemistry Department, Institut Pasteur and CNRS UMR 3825, 25-28, rue Dr. Roux, 75 724 Paris, France.
J Cheminform. 2015 Aug 28;7:45. doi: 10.1186/s13321-015-0086-2. eCollection 2015.
In silico predictive models have proved to be valuable for the optimisation of compound potency, selectivity and safety profiles in the drug discovery process.
camb is an R package that provides an environment for the rapid generation of quantitative Structure-Property and Structure-Activity models for small molecules (including QSAR, QSPR, QSAM, PCM) and is aimed at both advanced and beginner R users. camb's capabilities include the standardisation of chemical structure representation, computation of 905 one-dimensional and 14 fingerprint type descriptors for small molecules, 8 types of amino acid descriptors, 13 whole protein sequence descriptors, filtering methods for feature selection, generation of predictive models (using an interface to the R package caret), as well as techniques to create model ensembles using techniques from the R package caretEnsemble). Results can be visualised through high-quality, customisable plots (R package ggplot2).
Overall, camb constitutes an open-source framework to perform the following steps: (1) compound standardisation, (2) molecular and protein descriptor calculation, (3) descriptor pre-processing and model training, visualisation and validation, and (4) bioactivity/property prediction for new molecules. camb aims to speed model generation, in order to provide reproducibility and tests of robustness. QSPR and proteochemometric case studies are included which demonstrate camb's application.Graphical abstractFrom compounds and data to models: a complete model building workflow in one package.
计算机预测模型已被证明在药物发现过程中对优化化合物效力、选择性和安全性方面具有重要价值。
camb是一个R软件包,为小分子快速生成定量结构-性质和结构-活性模型(包括QSAR、QSPR、QSAM、PCM)提供了一个环境,目标用户包括高级和初级R用户。camb的功能包括化学结构表示的标准化、小分子905种一维和14种指纹类型描述符的计算、8种氨基酸描述符、13种全蛋白质序列描述符、特征选择的过滤方法、预测模型的生成(使用与R软件包caret的接口),以及使用R软件包caretEnsemble中的技术创建模型集成的技术。结果可以通过高质量、可定制的图(R软件包ggplot2)进行可视化。
总体而言,camb构成了一个开源框架,可执行以下步骤:(1)化合物标准化;(2)分子和蛋白质描述符计算;(3)描述符预处理以及模型训练、可视化和验证;(4)新分子的生物活性/性质预测。camb旨在加快模型生成速度,以提供可重复性和稳健性测试。文中包含QSPR和蛋白质化学计量学案例研究,展示了camb的应用。
从化合物和数据到模型:一个软件包中的完整模型构建工作流程。