CDK-Taverna cheminformatics 开放工作流程环境的新进展。

New developments on the cheminformatics open workflow environment CDK-Taverna.

机构信息

Chemoinformatics and Metabolism, European Bioinformatics Institute (EBI), Cambridge, UK.

出版信息

J Cheminform. 2011 Dec 13;3:54. doi: 10.1186/1758-2946-3-54.

DOI:10.1186/1758-2946-3-54

PMID:22166170

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3292505/

Abstract

BACKGROUND

The computational processing and analysis of small molecules is at heart of cheminformatics and structural bioinformatics and their application in e.g. metabolomics or drug discovery. Pipelining or workflow tools allow for the Lego™-like, graphical assembly of I/O modules and algorithms into a complex workflow which can be easily deployed, modified and tested without the hassle of implementing it into a monolithic application. The CDK-Taverna project aims at building a free open-source cheminformatics pipelining solution through combination of different open-source projects such as Taverna, the Chemistry Development Kit (CDK) or the Waikato Environment for Knowledge Analysis (WEKA). A first integrated version 1.0 of CDK-Taverna was recently released to the public.

RESULTS

The CDK-Taverna project was migrated to the most up-to-date versions of its foundational software libraries with a complete re-engineering of its worker's architecture (version 2.0). 64-bit computing and multi-core usage by paralleled threads are now supported to allow for fast in-memory processing and analysis of large sets of molecules. Earlier deficiencies like workarounds for iterative data reading are removed. The combinatorial chemistry related reaction enumeration features are considerably enhanced. Additional functionality for calculating a natural product likeness score for small molecules is implemented to identify possible drug candidates. Finally the data analysis capabilities are extended with new workers that provide access to the open-source WEKA library for clustering and machine learning as well as training and test set partitioning. The new features are outlined with usage scenarios.

CONCLUSIONS

CDK-Taverna 2.0 as an open-source cheminformatics workflow solution matured to become a freely available and increasingly powerful tool for the biosciences. The combination of the new CDK-Taverna worker family with the already available workflows developed by a lively Taverna community and published on myexperiment.org enables molecular scientists to quickly calculate, process and analyse molecular data as typically found in e.g. today's systems biology scenarios.

摘要

背景

小分子的计算处理和分析是化学生物信息学和结构生物信息学的核心，它们在代谢组学或药物发现等领域得到了广泛应用。流水线或工作流程工具允许以图形方式将输入/输出模块和算法“搭积木”式地组装成一个复杂的工作流程，而无需将其实现到一个整体应用程序中，即可轻松部署、修改和测试该工作流程。CDK-Taverna 项目旨在通过组合不同的开源项目（如 Taverna、化学开发工具包 (CDK) 或怀卡托知识分析环境 (WEKA)）构建一个免费的开源化学生物信息学流水线解决方案。最近，CDK-Taverna 的第一个集成版本 1.0 已向公众发布。

结果

CDK-Taverna 项目已迁移到其基础软件库的最新版本，并对其工作人员架构（版本 2.0）进行了全面的重新设计。现在支持 64 位计算和多核心使用并行线程，以便对大量分子进行快速内存处理和分析。早期的缺陷，如迭代数据读取的解决方法已被删除。组合化学相关的反应枚举功能得到了显著增强。为小分子计算天然产物相似性得分实现了额外的功能，以识别可能的药物候选物。最后，通过提供对开源 WEKA 库的访问，新的工作人员扩展了数据分析功能，用于聚类和机器学习以及培训和测试集分区。新功能结合使用场景进行了概述。

结论

作为一个开源化学生物信息学工作流程解决方案，CDK-Taverna 2.0 已经成熟，成为生物科学领域自由可用且功能日益强大的工具。将新的 CDK-Taverna 工作人员系列与已经在 myexperiment.org 上发布的由活跃的 Taverna 社区开发的现有工作流程相结合，使分子科学家能够快速计算、处理和分析分子数据，这些数据通常可以在当今的系统生物学场景中找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79ec/3292505/3578b4583bb2/1758-2946-3-54-1.jpg

相似文献

New developments on the cheminformatics open workflow environment CDK-Taverna.CDK-Taverna cheminformatics 开放工作流程环境的新进展。

J Cheminform. 2011 Dec 13;3:54. doi: 10.1186/1758-2946-3-54.

CDK-Taverna: an open workflow environment for cheminformatics.CDK-Taverna：一个用于化学信息学的开放工作流环境。

BMC Bioinformatics. 2010 Mar 29;11:159. doi: 10.1186/1471-2105-11-159.

The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud.Taverna 工作流套件：在桌面、网络或云端设计和执行 Web 服务工作流。

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W557-61. doi: 10.1093/nar/gkt328. Epub 2013 May 2.

KNIME-CDK: Workflow-driven cheminformatics.KNIME-CDK：基于工作流的化学信息学。

BMC Bioinformatics. 2013 Aug 22;14:257. doi: 10.1186/1471-2105-14-257.

Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.Tavaxy：集成 Taverna 和 Galaxy 工作流并提供云计算支持。

BMC Bioinformatics. 2012 May 4;13:77. doi: 10.1186/1471-2105-13-77.

Support for Taverna workflows in the VPH-Share cloud platform.在 VPH-Share 云平台中支持 Taverna 工作流。

Comput Methods Programs Biomed. 2017 Jul;146:37-46. doi: 10.1016/j.cmpb.2017.05.006. Epub 2017 May 20.

Natural product-likeness score revisited: an open-source, open-data implementation.重新审视天然产物相似度评分：一个开源、开放数据的实现。

BMC Bioinformatics. 2012 May 20;13:106. doi: 10.1186/1471-2105-13-106.

BioMoby extensions to the Taverna workflow management and enactment software.用于Taverna工作流管理与执行软件的BioMoby扩展。

BMC Bioinformatics. 2006 Nov 30;7:523. doi: 10.1186/1471-2105-7-523.

The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching.化学开发工具包（CDK）v2.0：原子类型标注、描绘、分子式及子结构搜索。

J Cheminform. 2017 Jun 6;9(1):33. doi: 10.1186/s13321-017-0220-4.

Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data.在Taverna工作流中对定量数据进行统计分析：使用R和maxdBrowse从微阵列数据中识别差异表达基因的示例。

BMC Bioinformatics. 2008 Aug 7;9:334. doi: 10.1186/1471-2105-9-334.

引用本文的文献

Scaffold Hunter: a comprehensive visual analytics framework for drug discovery.支架猎手：一个用于药物发现的综合可视化分析框架。

J Cheminform. 2017 May 11;9(1):28. doi: 10.1186/s13321-017-0213-3.

J Cheminform. 2017 Jun 6;9(1):33. doi: 10.1186/s13321-017-0220-4.

Exploring Protein-Protein Interactions as Drug Targets for Anti-cancer Therapy with In Silico Workflows.利用计算机模拟工作流程探索蛋白质-蛋白质相互作用作为抗癌治疗的药物靶点

Methods Mol Biol. 2017;1647:221-236. doi: 10.1007/978-1-4939-7201-2_15.

Structure-based virtual screening for drug discovery: principles, applications and recent advances.基于结构的药物发现虚拟筛选：原理、应用及最新进展

Curr Top Med Chem. 2014;14(16):1923-38. doi: 10.2174/1568026614666140929124445.

Metabolomics and systems pharmacology: why and how to model the human metabolic network for drug discovery.代谢组学与系统药理学：为何以及如何构建用于药物发现的人类代谢网络模型。

Drug Discov Today. 2014 Feb;19(2):171-82. doi: 10.1016/j.drudis.2013.07.014. Epub 2013 Jul 26.

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W557-61. doi: 10.1093/nar/gkt328. Epub 2013 May 2.

Applications of the InChI in cheminformatics with the CDK and Bioclipse.InChI 在 cheminformatics 中的应用，包括 CDK 和 Bioclipse。

J Cheminform. 2013 Mar 13;5(1):14. doi: 10.1186/1758-2946-5-14.

A survey of quantitative descriptions of molecular structure.分子结构的定量描述综述。

Curr Top Med Chem. 2012;12(18):1946-56. doi: 10.2174/156802612804910278.

Template-based combinatorial enumeration of virtual compound libraries for lipids.基于模板的虚拟脂质化合物库组合枚举。

J Cheminform. 2012 Sep 25;4(1):23. doi: 10.1186/1758-2946-4-23.

Scientific workflow systems: Pipeline Pilot and KNIME.科学工作流系统：管道先导（Pipeline Pilot）和康奈姆（KNIME）。

J Comput Aided Mol Des. 2012 Jul;26(7):801-4. doi: 10.1007/s10822-012-9577-7. Epub 2012 May 27.

本文引用的文献

CDK-Taverna: an open workflow environment for cheminformatics.CDK-Taverna：一个用于化学信息学的开放工作流环境。

BMC Bioinformatics. 2010 Mar 29;11:159. doi: 10.1186/1471-2105-11-159.

'Metabolite-likeness' as a criterion in the design and selection of pharmaceutical drug libraries.“代谢物相似性”作为药物文库设计与筛选的一项标准。

Drug Discov Today. 2009 Jan;14(1-2):31-40. doi: 10.1016/j.drudis.2008.10.011. Epub 2008 Dec 26.

Scientific workflows as productivity tools for drug discovery.作为药物发现生产力工具的科学工作流程。

Curr Opin Drug Discov Devel. 2008 May;11(3):381-8.

Natural product-likeness score and its application for prioritization of compound libraries.天然产物相似性评分及其在化合物库优先级排序中的应用。

J Chem Inf Model. 2008 Jan;48(1):68-74. doi: 10.1021/ci700286x. Epub 2007 Nov 23.

Cheminformatics analysis and learning in a data pipelining environment.数据管道环境中的化学信息学分析与学习

Mol Divers. 2006 Aug;10(3):283-99. doi: 10.1007/s11030-006-9041-5. Epub 2006 Sep 22.

Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics.化学开发工具包（CDK）的最新进展——一个用于化学和生物信息学的开源Java库。

Curr Pharm Des. 2006;12(17):2111-20. doi: 10.2174/138161206777585274.

Taverna: a tool for the composition and enactment of bioinformatics workflows.Taverna：一种用于生物信息学工作流程的组合与执行的工具。

Bioinformatics. 2004 Nov 22;20(17):3045-54. doi: 10.1093/bioinformatics/bth361. Epub 2004 Jun 16.

The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences.标志性分子描述符。4. 使用扩展价序列规范分子。

J Chem Inf Comput Sci. 2004 Mar-Apr;44(2):427-36. doi: 10.1021/ci0341823.

The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics.化学开发工具包（CDK）：一个用于化学信息学和生物信息学的开源Java库。

J Chem Inf Comput Sci. 2003 Mar-Apr;43(2):493-500. doi: 10.1021/ci025584y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CDK-Taverna cheminformatics 开放工作流程环境的新进展。

New developments on the cheminformatics open workflow environment CDK-Taverna.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献