Department of Analytical Chemistry and Computer Chemistry, University of Plovdiv, 24 Tsar Assen St., Plovdiv, 4000, Bulgaria.
Ideaconsult Ltd, 4 Angel Kanchev Str., Sofia, 1000, Bulgaria.
Mol Inform. 2019 Aug;38(8-9):e1800138. doi: 10.1002/minf.201800138. Epub 2019 Jan 17.
Ambit-GCM is a new software tool for group contribution modelling (GCM), developed as a part of the chemoinformatics platform AMBIT. It is an open-source tool distributed under LGPL license, written in Java and based on the Chemistry Development Kit. Ambit-GCM provides an environment for creating models of molecular properties using additive schemes of zero, first or second orders. Ambit-GCM supports a set of local atomic attributes used for dynamic configuration of desired atom descriptions, which are applied to define fragments of different sizes. All defined groups are exhaustively generated for each molecule from a training set of compounds and combined to form the basic set of GCM fragments. Additionally, Ambit-GCM users can define correction factors via custom SMARTS notations or add externally calculated molecular descriptors. A molecular property model is obtained as a sum over all found groups by multiplying each group or correction factor frequency to its corresponding contribution. Multiple linear regression analysis (MLRA) is used for group contributions calculation. Ambit-GCM performs full statistical characterization of the obtained MLRA models via various validation techniques: external tests validation, cross validation, y-scrambling, etc. The software can be optionally used only for molecule fragmentation combined with an external statistical modelling package for further processing. Ambit-GCM example usage and test cases are given.
Ambit-GCM 是一款用于基团贡献建模 (GCM) 的新软件工具,作为 chemoinformatics 平台 AMBIT 的一部分开发。它是一个基于 Chemistry Development Kit 、用 Java 编写并遵循 LGPL 许可证的开源工具。Ambit-GCM 提供了一个使用零阶、一阶或二阶加和方案创建分子性质模型的环境。Ambit-GCM 支持一组用于动态配置所需原子描述的局部原子属性,这些属性用于定义不同大小的片段。所有定义的基团都从化合物的训练集中为每个分子进行穷举生成,并组合成 GCM 片段的基本集。此外,Ambit-GCM 用户可以通过自定义 SMARTS 符号定义校正因子,或添加外部计算的分子描述符。分子性质模型是通过将每个基团或校正因子频率乘以其相应的贡献来对所有发现的基团求和获得的。多元线性回归分析 (MLRA) 用于基团贡献计算。Ambit-GCM 通过各种验证技术对获得的 MLRA 模型进行全面的统计特征描述:外部测试验证、交叉验证、y-打乱等。该软件可以选择仅用于与外部统计建模包结合使用的分子片段化,以进行进一步处理。提供了 Ambit-GCM 的示例用法和测试用例。