Scalfani Vincent F, Patel Vishank D, Fernandez Avery M
University Libraries, Rodgers Library for Science and Engineering, The University of Alabama, Tuscaloosa, AL, 35487, USA.
J Cheminform. 2022 Dec 28;14(1):87. doi: 10.1186/s13321-022-00664-x.
This article demonstrates how to create Chemical Space Networks (CSNs) using a Python RDKit and NetworkX workflow. CSNs are a type of network visualization that depict compounds as nodes connected by edges, defined as a pairwise relationship such as a 2D fingerprint similarity value. A step by step approach is presented for creating two different CSNs in this manuscript, one based on RDKit 2D fingerprint Tanimoto similarity values, and another based on maximum common substructure similarity values. Several different CSN visualization features are included in the tutorial including methods to represent nodes with color based on bioactivity attribute value, edges with different line styles based on similarity value, as well as replacing the circle nodes with 2D structure depictions. Finally, some common network property and analysis calculations are presented including the clustering coefficient, degree assortativity, and modularity. All code is provided in the form of Jupyter Notebooks and is available on GitHub with a permissive BSD-3 open-source license: https://github.com/vfscalfani/CSN_tutorial.
本文演示了如何使用Python的RDKit和NetworkX工作流程创建化学空间网络(CSN)。CSN是一种网络可视化类型,将化合物描绘为通过边连接的节点,边定义为一种成对关系,如二维指纹相似性值。本文提出了一种逐步方法来创建两种不同的CSN,一种基于RDKit二维指纹的Tanimoto相似性值,另一种基于最大公共子结构相似性值。教程中包含了几种不同的CSN可视化功能,包括根据生物活性属性值用颜色表示节点、根据相似性值用不同线型表示边,以及用二维结构描绘替换圆形节点。最后,介绍了一些常见的网络属性和分析计算,包括聚类系数、度相关性和模块性。所有代码均以Jupyter Notebook的形式提供,并在GitHub上以宽松的BSD-3开源许可证发布:https://github.com/vfscalfani/CSN_tutorial 。