Maniyar Dharmesh M, Nabney Ian T, Williams Bruce S, Sewing Andreas
Neural Computing Research Group, Information Engineering, Aston University, Birmingham, B4 7ET, United Kingdom.
J Chem Inf Model. 2006 Jul-Aug;46(4):1806-18. doi: 10.1021/ci050471a.
Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software.
多维复合优化是药物发现过程中的一种新范式,在早期阶段提高效率,并在药物开发后期减少损耗。该策略的成功很大程度上依赖于对这种多维数据的理解以及从中提取有用信息。本文展示了如何使用有原则的可视化算法来理解和探索在药物发现早期阶段创建的大型数据集。所呈现的实验是在一个包含生物活性数据和一些全分子物理化学性质的真实数据集上进行的。数据可视化是以更简单的形式呈现复杂数据的一种流行方式。我们应用了强大的有原则的可视化方法,如生成地形映射(GTM)和分层GTM(HGTM),以帮助领域专家(筛选科学家、化学家、生物学家等)理解并做出有意义的决策。我们还将这些有原则的方法与相对更知名的可视化方法——主成分分析(PCA)、 Sammon映射和自组织映射(SOM)进行基准测试,以证明它们在帮助用户可视化在药物发现过程早期必须处理的大型多维数据集方面具有更强的能力。报告的结果清楚地表明,GTM和HGTM算法允许用户对不同靶点的活性化合物进行聚类,并且比基准方法能更好地理解它们。我们为领域专家提供了一个支持这些可视化算法的交互式软件工具。该工具通过探索从可视化算法获得的投影来方便领域专家,提供诸如平行坐标图、放大因子、方向曲率以及与行业标准软件集成等功能。