Technische Universität München, Institut für Informatik/I12, Garching b. München, Germany.
J Chem Inf Model. 2013 May 24;53(5):1017-25. doi: 10.1021/ci300182p. Epub 2013 Apr 29.
The concept of molecular similarity is one of the most central in the fields of predictive toxicology and quantitative structure-activity relationship (QSAR) research. Many toxicological responses result from a multimechanistic process and, consequently, structural diversity among the active compounds is likely. Combining this knowledge, we introduce similarity boosted QSAR modeling, where we calculate molecular descriptors using similarities with respect to representative reference compounds to aid a statistical learning algorithm in distinguishing between different structural classes. We present three approaches for the selection of reference compounds, one by literature search and two by clustering. Our experimental evaluation on seven publicly available data sets shows that the similarity descriptors used on their own perform quite well compared to structural descriptors. We show that the combination of similarity and structural descriptors enhances the performance and that a simple stacking approach is able to use the complementary information encoded by the different descriptor sets to further improve predictive results. All software necessary for our experiments is available within the cheminformatics software framework AZOrange.
分子相似性的概念是预测毒理学和定量构效关系(QSAR)研究领域中最核心的概念之一。许多毒理反应是多机制过程的结果,因此,活性化合物之间的结构多样性是可能的。结合这方面的知识,我们引入了相似性增强的 QSAR 建模,其中我们使用相对于代表性参考化合物的相似性来计算分子描述符,以帮助统计学习算法区分不同的结构类别。我们提出了三种选择参考化合物的方法,一种是通过文献搜索,两种是通过聚类。我们在七个公开可用的数据集中进行的实验评估表明,单独使用相似性描述符的性能与结构描述符相当。我们表明,相似性和结构描述符的组合可以提高性能,并且简单的堆叠方法能够利用不同描述符集编码的互补信息,进一步提高预测结果。我们实验所需的所有软件都可在化学信息学软件框架 AZOrange 内获得。