Suppr超能文献

基于生物活性谱的指纹在建立机器学习模型中的应用。

Application of Bioactivity Profile-Based Fingerprints for Building Machine Learning Models.

机构信息

Hit Discovery, Discovery Sciences, IMED Biotech Unit , AstraZeneca , Pepparedsleden 1 , 43153 Mölndal , Sweden.

Intel Corporation, Data Center Group , Veldkant 31 , 2550 Kontich , Belgium.

出版信息

J Chem Inf Model. 2019 Mar 25;59(3):962-972. doi: 10.1021/acs.jcim.8b00550. Epub 2018 Nov 29.

Abstract

The volume of high throughput screening data has considerably increased since the beginning of the automated biochemical and cell-based assays era. This information-rich data source provides tremendous repurposing opportunities for data mining. It was recently shown that biochemical or cell-based assay results can be compiled into so-called high-throughput fingerprints (HTSFPs) as a new type of descriptor describing molecular bioactivity profiles which can be applied in virtual screening, iterative screening, and target deconvolution. However, so far, studies around HTSFPs and machine learning have mainly focused on predicting the outcome of molecules in single high-throughput assays, and no one has reported the modeling of compounds' biochemical assay activities toward a panel of target proteins. In this article, we aim at comparing how our in-house HTSFPs perform at this when combined with multitask deep learning versus the single task support vector machine method both in terms of hit identification and of scaffold hopping potential. Performances obtained from the two HTSFP models were reported with respect to the performances of multitask deep learning and support vector machine models built with the structural descriptors ECFP. Moreover, we investigated the effect of high throughput screening false positives and negatives on the performance of the generated models. Our results showed that the two fingerprints yielded in similar performances and diverse hits with very little overlap, thus demonstrating the orthogonality of bioactivity profile-based descriptors with structural descriptors. Therefore, modeling compound activity data using ECFPs together with HTSFPs increases the scaffold hopping potential of the predictive models.

摘要

自自动化生化和基于细胞的测定时代开始以来,高通量筛选数据的数量大大增加。这个信息丰富的数据源为数据挖掘提供了巨大的再利用机会。最近有人表明,生化或基于细胞的测定结果可以组合成所谓的高通量指纹(HTSFPs),作为一种新的描述符,用于描述分子生物活性谱,可用于虚拟筛选、迭代筛选和靶标分解。然而,到目前为止,关于 HTSFPs 和机器学习的研究主要集中在预测单个高通量测定中分子的结果上,没有人报告过化合物对一系列靶蛋白的生化测定活性的建模。在本文中,我们旨在比较当与多任务深度学习结合使用时,我们内部的 HTSFPs 在命中识别和支架跳跃潜力方面的表现如何,与单任务支持向量机方法相比。报告了来自两个 HTSFP 模型的性能,分别与基于结构描述符 ECFP 的多任务深度学习和支持向量机模型的性能进行了比较。此外,我们研究了高通量筛选假阳性和假阴性对生成模型性能的影响。我们的结果表明,这两种指纹具有相似的性能和不同的命中,几乎没有重叠,从而证明了基于生物活性谱描述符与结构描述符的正交性。因此,使用 ECFP 结合 HTSFPs 对化合物活性数据进行建模可以提高预测模型的支架跳跃潜力。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验