Ekins Sean, Clark Alex M, Dole Krishna, Gregory Kellan, Mcnutt Andrew M, Spektor Anna Coulon, Weatherall Charlie, Litterman Nadia K, Bunin Barry A
Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
Collaborative Drug Discovery, Inc., Burlingame, CA, USA.
Methods Mol Biol. 2018;1755:197-221. doi: 10.1007/978-1-4939-7724-6_14.
We are now seeing the benefit of investments made over the last decade in high-throughput screening (HTS) that is resulting in large structure activity datasets entering public and open databases such as ChEMBL and PubChem. The growth of academic HTS screening centers and the increasing move to academia for early stage drug discovery suggests a great need for the informatics tools and methods to mine such data and learn from it. Collaborative Drug Discovery, Inc. (CDD) has developed a number of tools for storing, mining, securely and selectively sharing, as well as learning from such HTS data. We present a new web based data mining and visualization module directly within the CDD Vault platform for high-throughput drug discovery data that makes use of a novel technology stack following modern reactive design principles. We also describe CDD Models within the CDD Vault platform that enables researchers to share models, share predictions from models, and create models from distributed, heterogeneous data. Our system is built on top of the Collaborative Drug Discovery Vault Activity and Registration data repository ecosystem which allows users to manipulate and visualize thousands of molecules in real time. This can be performed in any browser on any platform. In this chapter we present examples of its use with public datasets in CDD Vault. Such approaches can complement other cheminformatics tools, whether open source or commercial, in providing approaches for data mining and modeling of HTS data.
我们现在看到了过去十年在高通量筛选(HTS)方面投资的好处,这使得大量的结构活性数据集进入了诸如ChEMBL和PubChem等公共开放数据库。学术性高通量筛选中心的发展以及早期药物发现向学术界的日益转移表明,迫切需要信息学工具和方法来挖掘此类数据并从中学习。协作药物发现公司(CDD)已经开发了许多工具,用于存储、挖掘、安全且有选择地共享以及从此类高通量筛选数据中学习。我们在CDD Vault平台中直接展示了一个新的基于网络的数据挖掘和可视化模块,用于高通量药物发现数据,该模块采用了遵循现代响应式设计原则的新技术栈。我们还描述了CDD Vault平台中的CDD模型,该模型使研究人员能够共享模型、共享模型预测结果,并从分布式的异构数据创建模型。我们的系统建立在协作药物发现Vault活性和注册数据存储库生态系统之上,允许用户实时操作和可视化数千种分子。这可以在任何平台的任何浏览器中进行。在本章中,我们展示了其在CDD Vault中与公共数据集一起使用的示例。此类方法可以补充其他化学信息学工具,无论是开源的还是商业的,为高通量筛选数据的数据挖掘和建模提供方法。