Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Mail Stop 50F-1650, Berkeley, California 94720-8139, USA.
J Chem Inf Model. 2012 Nov 26;52(11):2902-9. doi: 10.1021/ci300289b. Epub 2012 Oct 23.
Congeners are molecules based on the same carbon skeleton but are different by the number of substituents and/or a substitution pattern. Examples are 1-chloronaphthalene, 1,4-dichloronaphthalene, and 1,3,8-trichloronaphthalene. Various persistent organic pollutants (POPs) exist in the environment as families of congeners. Very large numbers of possible congeners make their experimental characterization and risk assessment unfeasible. Computational high-throughput and quantitative structure-property relationship (QSPR) modeling has been limited by the lack of tools and approaches facilitating analysis of such POP families. We present a comprehensive approach that enables modeling of extremely large congeneric libraries. The approach involves three steps: (1) combinatorial generation of a library of congeners, (2) quantum chemical characterization of each structure at the PM6 semiempirical level to obtain molecular descriptors, and (3) analysis of the information generated in step 2. In steps 1-3, we employ combinatorial, computational, and cheminformatics techniques, respectively. Therefore, this hybrid approach is named "Combinatorial × Computational × Cheminformatics", or just abbreviated as C(3) (or C-cubed) approach. We demonstrate the usefulness of this approach by generating and characterizing Br- and Cl-substituted congeneric families of 23 typical POPs. The analysis of the resulting set of 1 840 951 congeners that includes Cl-, Br-, and mixed Br/Cl-substituted species, proves that, based on structural similarities defined by the molecular descriptors' values, the existing QSPR models developed originally for Cl- and Br-substituted congeners can be applied also to mixed Br/Cl-substituted ones. Thus, the C(3) approach may serve as a tool for exploring structural applicability domains of the existing QSPR models for congeneric sets.
同系物是指基于相同碳骨架但取代基数量和/或取代模式不同的分子。例如 1-氯萘、1,4-二氯萘和 1,3,8-三氯萘。各种持久性有机污染物 (POPs) 以同系物家族的形式存在于环境中。大量可能的同系物使得对它们进行实验表征和风险评估变得不可行。计算高通量和定量结构-性质关系 (QSPR) 建模受到缺乏有助于分析此类 POP 家族的工具和方法的限制。我们提出了一种全面的方法,能够对非常大的同系物库进行建模。该方法包括三个步骤:(1) 同系物库的组合生成,(2) 在 PM6 半经验水平下对每个结构进行量子化学表征以获得分子描述符,以及 (3) 分析步骤 2 中生成的信息。在步骤 1-3 中,我们分别采用组合、计算和化学信息学技术。因此,这种混合方法被命名为“组合×计算×化学信息学”,或简称为 C(3)(或 C-cubed)方法。我们通过生成和表征 23 种典型 POP 的 Br-和 Cl-取代同系物家族来演示该方法的有用性。对包括 Cl-、Br-和混合 Br/Cl 取代物种的 1 840 951 种同系物的结果集的分析证明,基于由分子描述符值定义的结构相似性,可以将最初为 Cl-和 Br-取代同系物开发的现有 QSPR 模型应用于混合 Br/Cl 取代同系物。因此,C(3) 方法可以作为探索现有 QSPR 模型在同系物集合中的结构适用性域的工具。