自动检测化学物质信息数据库中的工作流程。

Automatically detecting workflows in PubChem.

作者信息

Calhoun Bradley T, Browning Michael R, Chen Brian R, Bittker Joshua A, Swamidass S Joshua

机构信息

Washington University School of Medicine, 660 S. Euclid, St Louis, MO 63108, USA.

出版信息

J Biomol Screen. 2012 Sep;17(8):1071-9. doi: 10.1177/1087057112449054. Epub 2012 Jun 12.

DOI:10.1177/1087057112449054

PMID:22693105

Abstract

Public databases that store the data from small-molecule screens are a rich and untapped resource of chemical and biological information. However, screening databases are unorganized, which makes interpreting their data difficult. We propose a method of inferring workflow graphs--which encode the relationships between assays in screening projects--directly from screening data and using these workflows to organize each project's data. On the basis of four heuristics regarding the organization of screening projects, we designed an algorithm that extracts a project's workflow graph from screening data. Where possible, the algorithm is evaluated by comparing each project's inferred workflow to its documentation. In the majority of cases, there are no discrepancies between the two. Most errors can be traced to points in the project where screeners chose additional molecules to test based on structural similarity to promising molecules, a case our algorithm is not yet capable of handling. Nonetheless, these workflows accurately organize most of the data and also provide a method of visualizing a screening project. This method is robust enough to build a workflow-oriented front-end to PubChem and is currently being used regularly by both our lab and our collaborators. A Python implementation of the algorithm is available online, and a searchable database of all PubChem workflows is available at http://swami.wustl.edu/flow.

摘要

存储小分子筛选数据的公共数据库是化学和生物信息的丰富且未被利用的资源。然而，筛选数据库是无组织的，这使得解释其数据变得困难。我们提出了一种直接从筛选数据推断工作流图（编码筛选项目中各测定之间的关系）并使用这些工作流来组织每个项目数据的方法。基于关于筛选项目组织的四种启发式方法，我们设计了一种从筛选数据中提取项目工作流图的算法。在可能的情况下，通过将每个项目推断的工作流与其文档进行比较来评估该算法。在大多数情况下，两者之间没有差异。大多数错误可追溯到项目中筛选人员根据与有前景分子的结构相似性选择额外分子进行测试的点，这是我们的算法尚无法处理的情况。尽管如此，这些工作流准确地组织了大部分数据，还提供了一种可视化筛选项目的方法。这种方法足够强大，可以构建一个面向工作流的PubChem前端，并且目前我们实验室和合作者都在定期使用。该算法的Python实现可在线获取，所有PubChem工作流的可搜索数据库可在http://swami.wustl.edu/flow获取。

相似文献

Automatically detecting workflows in PubChem.自动检测化学物质信息数据库中的工作流程。

J Biomol Screen. 2012 Sep;17(8):1071-9. doi: 10.1177/1087057112449054. Epub 2012 Jun 12.

A searchable map of PubChem.PubChem 的可搜索地图。

J Chem Inf Model. 2010 Nov 22;50(11):1924-34. doi: 10.1021/ci100237q. Epub 2010 Oct 14.

Biowep: a workflow enactment portal for bioinformatics applications.生物工作流引擎（Biowep）：一个用于生物信息学应用的工作流制定门户。

BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S19. doi: 10.1186/1471-2105-8-S1-S19.

SymDex: increasing the efficiency of chemical fingerprint similarity searches for comparing large chemical libraries by using query set indexing.SymDex：通过查询集索引提高化学指纹相似性搜索比较大型化学库的效率。

J Chem Inf Model. 2012 Aug 27;52(8):1926-35. doi: 10.1021/ci200606t. Epub 2012 Aug 7.

GPU accelerated support vector machines for mining high-throughput screening data.GPU 加速的支持向量机在高通量筛选数据中的应用。

J Chem Inf Model. 2009 Dec;49(12):2718-25. doi: 10.1021/ci900337f.

QSAR modeling of imbalanced high-throughput screening data in PubChem.基于PubChem中不平衡高通量筛选数据的定量构效关系建模

J Chem Inf Model. 2014 Mar 24;54(3):705-12. doi: 10.1021/ci400737s. Epub 2014 Feb 28.

Scaffold network generator: a tool for mining molecular structures.支架网络生成器：一种挖掘分子结构的工具。

Bioinformatics. 2013 Oct 15;29(20):2655-6. doi: 10.1093/bioinformatics/btt448. Epub 2013 Aug 5.

Accelerating chemical database searching using graphics processing units.利用图形处理单元加速化学数据库搜索。

J Chem Inf Model. 2011 Aug 22;51(8):1807-16. doi: 10.1021/ci200164g. Epub 2011 Jul 13.

Data mining a small molecule drug screening representative subset from NIH PubChem.从美国国立医学图书馆化学数据库（NIH PubChem）中挖掘小分子药物筛选代表性子集。

J Chem Inf Model. 2008 Mar;48(3):465-75. doi: 10.1021/ci700193u. Epub 2008 Feb 27.

Visual characterization and diversity quantification of chemical libraries: 1. creation of delimited reference chemical subspaces.化学文库的可视化特征描述和多样性量化：1. 限定参考化学子空间的创建。

J Chem Inf Model. 2011 Aug 22;51(8):1762-74. doi: 10.1021/ci200051r. Epub 2011 Aug 3.

引用本文的文献

PubChem applications in drug discovery: a bibliometric analysis.《PubChem在药物发现中的应用：文献计量分析》

Drug Discov Today. 2014 Nov;19(11):1751-1756. doi: 10.1016/j.drudis.2014.08.008. Epub 2014 Aug 27.

Bigger data, collaborative tools and the future of predictive drug discovery.更大的数据、协作工具与预测性药物发现的未来。

J Comput Aided Mol Des. 2014 Oct;28(10):997-1008. doi: 10.1007/s10822-014-9762-y. Epub 2014 Jun 19.

Challenges in secondary analysis of high throughput screening data.高通量筛选数据二次分析中的挑战。

Pac Symp Biocomput. 2014:114-24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

自动检测化学物质信息数据库中的工作流程。

Automatically detecting workflows in PubChem.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献