Sosnin Sergey
Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090, Vienna, Austria.
J Cheminform. 2024 Aug 12;16(1):98. doi: 10.1186/s13321-024-00888-z.
The exponential growth of data is challenging for humans because their ability to analyze data is limited. Especially in chemistry, there is a demand for tools that can visualize molecular datasets in a convenient graphical way. We propose a new, ready-to-use, multi-tool, and open-source framework for visualizing and navigating chemical space. This framework adheres to the low-code/no-code (LCNC) paradigm, providing a KNIME node, a web-based tool, and a Python package, making it accessible to a broad cheminformatics community. The core technique of the MolCompass framework employs a pre-trained parametric t-SNE model. We demonstrate how this framework can be adapted for the visualisation of chemical space and visual validation of binary classification QSAR/QSPR models, revealing their weaknesses and identifying model cliffs. All parts of the framework are publicly available on GitHub, providing accessibility to the broad scientific community. Scientific contributionWe provide an open-source, ready-to-use set of tools for the visualization of chemical space. These tools can be insightful for chemists to analyze compound datasets and for the visual validation of QSAR/QSPR models.
数据的指数级增长对人类来说是一项挑战,因为人类分析数据的能力有限。特别是在化学领域,需要能够以方便的图形方式可视化分子数据集的工具。我们提出了一个新的、即用型、多工具且开源的框架,用于可视化和探索化学空间。该框架遵循低代码/无代码(LCNC)范式,提供了一个KNIME节点、一个基于网络的工具和一个Python包,使广大化学信息学社区都能使用。MolCompass框架的核心技术采用了预训练的参数化t-SNE模型。我们展示了如何将该框架用于化学空间的可视化以及二元分类QSAR/QSPR模型的可视化验证,揭示其弱点并识别模型悬崖。该框架的所有部分都在GitHub上公开可用,为广大科学界提供了便利。科学贡献我们提供了一套用于化学空间可视化的开源、即用型工具。这些工具对于化学家分析化合物数据集以及QSAR/QSPR模型的可视化验证可能具有重要意义。