Suppr超能文献

从鸟瞰图到分子群落:大型化合物数据集中结构-活性关系的双层可视化

From bird's eye views to molecular communities: two-layered visualization of structure-activity relationships in large compound data sets.

作者信息

Kayastha Shilva, Kunimoto Ryo, Horvath Dragos, Varnek Alexandre, Bajorath Jürgen

机构信息

Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, 53113, Bonn, Germany.

Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, 67000, Strasbourg, France.

出版信息

J Comput Aided Mol Des. 2017 Nov;31(11):961-977. doi: 10.1007/s10822-017-0070-1. Epub 2017 Oct 6.

Abstract

The analysis of structure-activity relationships (SARs) becomes rather challenging when large and heterogeneous compound data sets are studied. In such cases, many different compounds and their activities need to be compared, which quickly goes beyond the capacity of subjective assessments. For a comprehensive large-scale exploration of SARs, computational analysis and visualization methods are required. Herein, we introduce a two-layered SAR visualization scheme specifically designed for increasingly large compound data sets. The approach combines a new compound pair-based variant of generative topographic mapping (GTM), a machine learning approach for nonlinear mapping, with chemical space networks (CSNs). The GTM component provides a global view of the activity landscapes of large compound data sets, in which informative local SAR environments are identified, augmented by a numerical SAR scoring scheme. Prioritized local SAR regions are then projected into CSNs that resolve these regions at the level of individual compounds and their relationships. Analysis of CSNs makes it possible to distinguish between regions having different SAR characteristics and select compound subsets that are rich in SAR information.

摘要

当研究大规模且异质的化合物数据集时,结构-活性关系(SARs)的分析变得颇具挑战性。在这种情况下,需要比较许多不同的化合物及其活性,这很快就超出了主观评估的能力范围。对于SARs的全面大规模探索,需要计算分析和可视化方法。在此,我们介绍一种专门为日益增大的化合物数据集设计的两层SAR可视化方案。该方法将生成地形映射(GTM,一种用于非线性映射的机器学习方法)基于化合物对的新变体与化学空间网络(CSNs)相结合。GTM组件提供了大型化合物数据集活性景观的全局视图,其中通过数值SAR评分方案识别出信息丰富的局部SAR环境。然后将优先考虑的局部SAR区域投影到CSNs中,CSNs在单个化合物及其关系层面解析这些区域。对CSNs的分析能够区分具有不同SAR特征的区域,并选择富含SAR信息的化合物子集。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验