Suppr超能文献

GA(M)E-QSAR:一种新颖的、全自动的基于遗传算法的(元)集成方法,用于配体药物设计中的二元分类。

GA(M)E-QSAR: a novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design.

机构信息

Computational Modeling Lab-CoMo, Department of Computer Sciences, Faculty of Sciences, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel, Belgium.

出版信息

J Chem Inf Model. 2012 Sep 24;52(9):2366-86. doi: 10.1021/ci300146h. Epub 2012 Aug 28.

Abstract

Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.

摘要

计算机辅助药物设计已成为药物发现过程的重要组成部分。尽管在这一领域取得了进展,但没有一种独特的建模方法可以成功应用于解决 QSAR 建模过程中遇到的各种问题。特征选择和集成建模是基于配体的药物设计中的活跃研究领域。在这里,我们介绍了 GA(M)E-QSAR 算法,该算法将遗传算法的搜索和优化能力与 Adaboost 集成分类算法的简单性相结合,以解决二进制分类问题。我们还探讨了使用 Adaboost 和投票方案训练的元集成在进一步提高遗传算法优化得出的最优 Adaboost 单集成的准确性、泛化能力和稳健性方面的有用性。我们使用来自文献中的五个数据集来评估我们算法的性能,发现它能够产生与这些数据集报告的类似或更好的分类结果,并且当只考虑最活跃的化学物质时,相对于整个活性子集,它能够富集更多的活性化合物。更重要的是,我们将我们的方法与最先进的特征选择和分类方法进行了比较,发现它可以提供高度准确、稳健和可推广的模型。在遗传算法搜索得出的 Adaboost 集成的情况下,最终模型非常简单,因为它们由单个特征分类器输出的加权和组成。此外,Adaboost 得分可以用作虚拟筛选实验后合成和生物评估优先级化学物质的排序标准。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验