School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India.
BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):550. doi: 10.1186/s12859-018-2492-8.
Traditional drug discovery approaches are time-consuming, tedious and expensive. Identifying a potential drug-like molecule using high throughput screening (HTS) with high confidence is always a challenging task in drug discovery and cheminformatics. A small percentage of molecules that pass the clinical trial phases receives FDA approval. This whole process takes 10-12 years and millions of dollar of investment. The inconsistency in HTS is also a challenge for reproducible results. Reproducible research in computational research is highly desirable as a measure to evaluate scientific claims and published findings. This paper describes the development and availability of a knowledge based predictive model building system using the R Statistical Computing Environment and its ensured reproducibility using Galaxy workflow system.
We describe a web-enabled data mining analysis pipeline which employs reproducible research approaches to confront the issue of availability of tools in high throughput virtual screening. The pipeline, named as "Galaxy for Compound Activity Classification (GCAC)" includes descriptor calculation, feature selection, model building, and screening to extract potent candidates, by leveraging the combined capabilities of R statistical packages and literate programming tools contained within a workflow system environment with automated configuration.
GCAC can serve as a standard for screening drug candidates using predictive model building under galaxy environment, allowing for easy installation and reproducibility. A demo site of the tool is available at http://ccbb.jnu.ac.in/gcac.
传统的药物发现方法既耗时、繁琐又昂贵。使用高通量筛选(HTS)高置信度识别潜在的类药分子始终是药物发现和化学信息学中的一项具有挑战性的任务。只有一小部分通过临床试验阶段的分子能获得 FDA 批准。整个过程需要 10-12 年和数百万美元的投资。HTS 的不一致性也是重现性结果的一个挑战。计算研究中的可重现性研究是高度可取的,可作为评估科学主张和已发表发现的一种措施。本文描述了一种基于知识的预测模型构建系统的开发和可用性,该系统使用 R 统计计算环境,并使用 Galaxy 工作流程系统确保其可重现性。
我们描述了一个支持网络的数据挖掘分析管道,该管道采用可重现性研究方法来解决高通量虚拟筛选中工具可用性的问题。该管道名为“Galaxy for Compound Activity Classification (GCAC)”,包括描述符计算、特征选择、模型构建和筛选,通过利用 R 统计软件包的综合功能和工作流程系统环境中包含的文学编程工具,以自动化配置提取有效候选物。
GCAC 可以作为在 Galaxy 环境下使用预测模型构建筛选药物候选物的标准,允许轻松安装和重现性。该工具的演示站点可在 http://ccbb.jnu.ac.in/gcac 上获得。