GCAC：用于虚拟筛选中预测模型构建的星系工作流系统。

GCAC: galaxy workflow system for predictive model building for virtual screening.

机构信息

School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India.

出版信息

BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):550. doi: 10.1186/s12859-018-2492-8.

DOI:10.1186/s12859-018-2492-8

PMID:30717669

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7394323/

Abstract

BACKGROUND

Traditional drug discovery approaches are time-consuming, tedious and expensive. Identifying a potential drug-like molecule using high throughput screening (HTS) with high confidence is always a challenging task in drug discovery and cheminformatics. A small percentage of molecules that pass the clinical trial phases receives FDA approval. This whole process takes 10-12 years and millions of dollar of investment. The inconsistency in HTS is also a challenge for reproducible results. Reproducible research in computational research is highly desirable as a measure to evaluate scientific claims and published findings. This paper describes the development and availability of a knowledge based predictive model building system using the R Statistical Computing Environment and its ensured reproducibility using Galaxy workflow system.

RESULTS

We describe a web-enabled data mining analysis pipeline which employs reproducible research approaches to confront the issue of availability of tools in high throughput virtual screening. The pipeline, named as "Galaxy for Compound Activity Classification (GCAC)" includes descriptor calculation, feature selection, model building, and screening to extract potent candidates, by leveraging the combined capabilities of R statistical packages and literate programming tools contained within a workflow system environment with automated configuration.

CONCLUSION

GCAC can serve as a standard for screening drug candidates using predictive model building under galaxy environment, allowing for easy installation and reproducibility. A demo site of the tool is available at http://ccbb.jnu.ac.in/gcac.

摘要

背景

传统的药物发现方法既耗时、繁琐又昂贵。使用高通量筛选（HTS）高置信度识别潜在的类药分子始终是药物发现和化学信息学中的一项具有挑战性的任务。只有一小部分通过临床试验阶段的分子能获得 FDA 批准。整个过程需要 10-12 年和数百万美元的投资。HTS 的不一致性也是重现性结果的一个挑战。计算研究中的可重现性研究是高度可取的，可作为评估科学主张和已发表发现的一种措施。本文描述了一种基于知识的预测模型构建系统的开发和可用性，该系统使用 R 统计计算环境，并使用 Galaxy 工作流程系统确保其可重现性。

结果

我们描述了一个支持网络的数据挖掘分析管道，该管道采用可重现性研究方法来解决高通量虚拟筛选中工具可用性的问题。该管道名为“Galaxy for Compound Activity Classification (GCAC)”，包括描述符计算、特征选择、模型构建和筛选，通过利用 R 统计软件包的综合功能和工作流程系统环境中包含的文学编程工具，以自动化配置提取有效候选物。

结论

GCAC 可以作为在 Galaxy 环境下使用预测模型构建筛选药物候选物的标准，允许轻松安装和重现性。该工具的演示站点可在 http://ccbb.jnu.ac.in/gcac 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bba/7394323/358217e3766c/12859_2018_2492_Fig1_HTML.jpg

相似文献

GCAC: galaxy workflow system for predictive model building for virtual screening.GCAC：用于虚拟筛选中预测模型构建的星系工作流系统。

BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):550. doi: 10.1186/s12859-018-2492-8.

Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data.Galaxy-M：一种用于处理和分析基于直接进样和液相色谱质谱联用的代谢组学数据的Galaxy工作流程。

Gigascience. 2016 Feb 23;5:10. doi: 10.1186/s13742-016-0115-8. eCollection 2016.

Osiris: accessible and reproducible phylogenetic and phylogenomic analyses within the Galaxy workflow management system.Osiris：在 Galaxy 工作流管理系统中进行可访问和可重复的系统发生和系统基因组学分析。

BMC Bioinformatics. 2014 Jul 2;15:230. doi: 10.1186/1471-2105-15-230.

APOSTL: An Interactive Galaxy Pipeline for Reproducible Analysis of Affinity Proteomics Data.APOSTL：用于亲和蛋白质组学数据可重复分析的交互式星系管道

J Proteome Res. 2016 Dec 2;15(12):4747-4754. doi: 10.1021/acs.jproteome.6b00660. Epub 2016 Oct 21.

Laniakea: an open solution to provide Galaxy "on-demand" instances over heterogeneous cloud infrastructures.拉尼亚凯亚超星系团：一种提供 Galaxy“按需”实例的开放式解决方案，可在异构云基础架构上使用。

Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa033.

Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.Tavaxy：集成 Taverna 和 Galaxy 工作流并提供云计算支持。

BMC Bioinformatics. 2012 May 4;13:77. doi: 10.1186/1471-2105-13-77.

ReGaTE: Registration of Galaxy Tools in Elixir.ReGaTE：Elixir中Galaxy工具的注册。

Gigascience. 2017 Jun 1;6(6):1-4. doi: 10.1093/gigascience/gix022.

CDK-Taverna: an open workflow environment for cheminformatics.CDK-Taverna：一个用于化学信息学的开放工作流环境。

BMC Bioinformatics. 2010 Mar 29;11:159. doi: 10.1186/1471-2105-11-159.

The COMBAT-TB Workbench: Making Powerful Mycobacterium tuberculosis Bioinformatics Accessible.COMBAT-TB 工作台：让强大的结核分枝杆菌生物信息学变得易于访问。

mSphere. 2022 Feb 23;7(1):e0099121. doi: 10.1128/msphere.00991-21. Epub 2022 Feb 9.

ballaxy: web services for structural bioinformatics.Ballaxy：用于结构生物信息学的网络服务。

Bioinformatics. 2015 Jan 1;31(1):121-2. doi: 10.1093/bioinformatics/btu574. Epub 2014 Sep 2.

引用本文的文献

Graph-Based Feature Selection Approach for Molecular Activity Prediction.基于图的特征选择方法在分子活性预测中的应用。

J Chem Inf Model. 2022 Apr 11;62(7):1618-1632. doi: 10.1021/acs.jcim.1c01578. Epub 2022 Mar 22.

Artificial intelligence to deep learning: machine intelligence approach for drug discovery.人工智能到深度学习：药物发现的机器智能方法。

Mol Divers. 2021 Aug;25(3):1315-1360. doi: 10.1007/s11030-021-10217-3. Epub 2021 Apr 12.

Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling.通过大数据和数据驱动的机器学习建模来推动计算机辅助药物发现 (CADD)。

Drug Discov Today. 2020 Sep;25(9):1624-1638. doi: 10.1016/j.drudis.2020.07.005. Epub 2020 Jul 11.

Influence of feature rankers in the construction of molecular activity prediction models.特征排序器对分子活性预测模型构建的影响。

J Comput Aided Mol Des. 2020 Mar;34(3):305-325. doi: 10.1007/s10822-019-00273-1. Epub 2019 Dec 31.

Big Data and Artificial Intelligence Modeling for Drug Discovery.大数据和人工智能在药物发现中的建模。

Annu Rev Pharmacol Toxicol. 2020 Jan 6;60:573-589. doi: 10.1146/annurev-pharmtox-010919-023324. Epub 2019 Sep 13.

APBioNet's annual International Conference on Bioinformatics (InCoB) returns to India in 2018.APBioNet 的年度国际生物信息学会议（InCoB）将于 2018 年重回印度。

BMC Genomics. 2019 Apr 18;19(Suppl 9):266. doi: 10.1186/s12864-019-5582-8.

本文引用的文献

Advances in virtual screening.虚拟筛选的进展。

Drug Discov Today Technol. 2006 Winter;3(4):405-411. doi: 10.1016/j.ddtec.2006.12.002. Epub 2007 Jan 12.

MayaChemTools: An Open Source Package for Computational Drug Discovery.MayaChemTools：一个用于计算药物发现的开源软件包。

J Chem Inf Model. 2016 Dec 27;56(12):2292-2297. doi: 10.1021/acs.jcim.6b00505. Epub 2016 Nov 16.

Innovation in the pharmaceutical industry: New estimates of R&D costs.制药行业的创新：研发成本的新估计

J Health Econ. 2016 May;47:20-33. doi: 10.1016/j.jhealeco.2016.01.012. Epub 2016 Feb 12.

Chemical predictive modelling to improve compound quality.化学预测建模以提高化合物质量。

Nat Rev Drug Discov. 2013 Dec;12(12):948-62. doi: 10.1038/nrd4128.

Computational models for tuberculosis drug discovery.用于结核病药物发现的计算模型。

Methods Mol Biol. 2013;993:245-62. doi: 10.1007/978-1-62703-342-8_16.

Bayesian models leveraging bioactivity and cytotoxicity information for drug discovery.利用生物活性和细胞毒性信息进行药物发现的贝叶斯模型。

Chem Biol. 2013 Mar 21;20(3):370-8. doi: 10.1016/j.chembiol.2013.01.011.

Predictive modeling of anti-malarial molecules inhibiting apicoplast formation.抗疟分子抑制类质体形成的预测模型。

BMC Bioinformatics. 2013 Feb 15;14:55. doi: 10.1186/1471-2105-14-55.

ChemModLab: a web-based cheminformatics modeling laboratory.化学模型实验室：一个基于网络的化学信息学建模实验室。

In Silico Biol. 2011;11(1-2):61-81. doi: 10.3233/CI-2008-0016.

Recognizing pitfalls in virtual screening: a critical review.认识虚拟筛选中的陷阱：批判性评价。

J Chem Inf Model. 2012 Apr 23;52(4):867-81. doi: 10.1021/ci200528d. Epub 2012 Apr 6.

Assessment of a rule-based virtual screening technology (INDDEx) on a benchmark data set.基于规则的虚拟筛选技术（INDDEx）在基准数据集上的评估。

J Phys Chem B. 2012 Jun 14;116(23):6732-9. doi: 10.1021/jp212084f. Epub 2012 Mar 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

GCAC：用于虚拟筛选中预测模型构建的星系工作流系统。

GCAC: galaxy workflow system for predictive model building for virtual screening.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献