Yang Jeremy J, Ursu Oleg, Lipinski Christopher A, Sklar Larry A, Oprea Tudor I, Bologa Cristian G
Translational Informatics Division, Department of Internal Medicine, University of New Mexico School of Medicine, Albuquerque, NM 87131 USA.
10 Connshire Drive, Waterford, CT 06385-4122 USA.
J Cheminform. 2016 May 28;8:29. doi: 10.1186/s13321-016-0137-3. eCollection 2016.
BACKGROUND: Bioassay data analysis continues to be an essential, routine, yet challenging task in modern drug discovery and chemical biology research. The challenge is to infer reliable knowledge from big and noisy data. Some aspects of this problem are general with solutions informed by existing and emerging data science best practices. Some aspects are domain specific, and rely on expertise in bioassay methodology and chemical biology. Testing compounds for biological activity requires complex and innovative methodology, producing results varying widely in accuracy, precision, and information content. Hit selection criteria involve optimizing such that the overall probability of success in a project is maximized, and resource-wasteful "false trails" are avoided. This "fail-early" approach is embraced both in pharmaceutical and academic drug discovery, since follow-up capacity is resource-limited. Thus, early identification of likely promiscuous compounds has practical value. RESULTS: Here we describe an algorithm for identifying likely promiscuous compounds via associated scaffolds which combines general and domain-specific features to assist and accelerate drug discovery informatics, called Badapple: bioassay-data associative promiscuity pattern learning engine. Results are described from an analysis using data from MLP assays via the BioAssay Research Database (BARD) http://bard.nih.gov. Specific examples are analyzed in the context of medicinal chemistry, to illustrate associations with mechanisms of promiscuity. Badapple has been developed at UNM, released and deployed for public use two ways: (1) BARD plugin, integrated into the public BARD REST API and BARD web client; and (2) public web app hosted at UNM. CONCLUSIONS: Badapple is a method for rapidly identifying likely promiscuous compounds via associated scaffolds. Badapple generates a score associated with a pragmatic, empirical definition of promiscuity, with the overall goal to identify "false trails" and streamline workflows. Unlike methods reliant on expert curation of chemical substructure patterns, Badapple is fully evidence-driven, automated, self-improving via integration of additional data, and focused on scaffolds. Badapple is robust with respect to noise and errors, and skeptical of scanty evidence.
背景:生物测定数据分析在现代药物发现和化学生物学研究中仍然是一项重要、常规但具有挑战性的任务。挑战在于从大量嘈杂的数据中推断出可靠的知识。这个问题的一些方面具有普遍性,可通过现有和新兴的数据科学最佳实践来解决。一些方面则是特定领域的,依赖于生物测定方法和化学生物学方面的专业知识。测试化合物的生物活性需要复杂且创新的方法,所产生的结果在准确性、精密度和信息含量方面差异很大。命中选择标准涉及进行优化,以使项目成功的总体概率最大化,并避免浪费资源的“错误线索”。这种“尽早失败”的方法在制药和学术药物发现中都被采用,因为后续研究能力受到资源限制。因此,尽早识别可能具有混杂活性的化合物具有实际价值。 结果:在此,我们描述了一种通过相关支架识别可能具有混杂活性的化合物的算法,该算法结合了通用和特定领域的特征,以辅助和加速药物发现信息学,称为Badapple:生物测定数据关联混杂模式学习引擎。使用通过生物测定研究数据库(BARD)http://bard.nih.gov获取的MLP测定数据进行分析,展示了结果。在药物化学背景下分析了具体示例,以说明与混杂机制的关联。Badapple是由新墨西哥大学开发的,通过两种方式发布并供公众使用:(1)BARD插件,集成到公共BARD REST API和BARD网络客户端中;(2)新墨西哥大学托管的公共网络应用程序。 结论:Badapple是一种通过相关支架快速识别可能具有混杂活性的化合物的方法。Badapple生成一个与实用的、基于经验的混杂定义相关的分数,总体目标是识别“错误线索”并简化工作流程。与依赖专家策划化学子结构模式的方法不同,Badapple完全由证据驱动,自动化,通过整合额外数据自我改进,并且专注于支架。Badapple对噪声和错误具有鲁棒性,并且对证据不足持怀疑态度。
J Cheminform. 2016-5-28
J Chem Inf Model. 2019-1-25
Future Med Chem. 2014-7
Nucleic Acids Res. 2013-11-5
Mol Inform. 2021-1
Eur J Med Chem. 2013-9-12
J Chem Inf Model. 2025-7-14
Nucleic Acids Res. 2024-7-5
Nat Rev Chem. 2024-5
J Comput Aided Mol Des. 2024-3-12
Nucleic Acids Res. 2023-1-6
Beilstein J Org Chem. 2022-9-29
Nat Chem Biol. 2015-8
Comb Chem High Throughput Screen. 2014-3
J Comput Aided Mol Des. 2013-5-1
Nat Chem Biol. 2009-7