Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, CH-3012 Berne, Switzerland.
J Chem Inf Model. 2010 Nov 22;50(11):1924-34. doi: 10.1021/ci100237q. Epub 2010 Oct 14.
The database PubChem was classified using 42 integer value descriptors of molecular structure, here called molecular quantum numbers (MQNs), which count atoms and bond types, polar groups, and topological features. Principal component analysis of the MQN data set shows that PubChem compounds occupy a partially filled elliptical cone in the (PC1,PC2,PC3) space whose axis is the first principal component PC1 (65% variability) representing molecular size, and the ellipse axes are PC2 (18% variability, representing structural flexibility) and PC3 (7% variability, representing polarity). A visual overview of PubChem is provided by color-coded representations of the (PC2,PC3) plane. The MQNs form a scalar fingerprint which can be used to measure the similarity between pairs of molecules and enable ligand-based virtual screening, as illustrated for the enrichment of bioactives from the DUD data set from PubChem. An MQN-annotated version of PubChem with an MQN-similarity search tool is available at www.gdb.unibe.ch .
数据库 PubChem 使用 42 个整数值描述符对分子结构进行分类,这里称为分子量子数 (MQN),用于计算原子和键类型、极性基团和拓扑特征。MQN 数据集的主成分分析表明,PubChem 化合物在 (PC1、PC2、PC3) 空间占据部分填充的椭圆锥,其轴为第一主成分 PC1(65%的可变性)代表分子大小,椭圆轴为 PC2(18%的可变性,代表结构灵活性)和 PC3(7%的可变性,代表极性)。PubChem 的可视化概述通过 (PC2、PC3) 平面的颜色编码表示提供。MQN 形成标量指纹,可用于测量分子对之间的相似度,并支持基于配体的虚拟筛选,如图所示,从 PubChem 的 DUD 数据集富集生物活性物质。带有 MQN 相似性搜索工具的 MQN 注释版 PubChem 可在 www.gdb.unibe.ch 获得。