Suppr超能文献

基于配体的虚拟筛选中的异类分类器融合:或者,委员会决策如何成为一件好事。

Heterogeneous classifier fusion for ligand-based virtual screening: or, how decision making by committee can be a good thing.

机构信息

Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, CH-4056 Basel, Switzerland.

出版信息

J Chem Inf Model. 2013 Nov 25;53(11):2829-36. doi: 10.1021/ci400466r. Epub 2013 Nov 14.

Abstract

The concept of data fusion - the combination of information from different sources describing the same object with the expectation to generate a more accurate representation - has found application in a very broad range of disciplines. In the context of ligand-based virtual screening (VS), data fusion has been applied to combine knowledge from either different active molecules or different fingerprints to improve similarity search performance. Machine-learning (ML) methods based on fusion of multiple homogeneous classifiers, in particular random forests, have also been widely applied in the ML literature. The heterogeneous version of classifier fusion - fusing the predictions from different model types - has been less explored. Here, we investigate heterogeneous classifier fusion for ligand-based VS using three different ML methods, RF, naïve Bayes (NB), and logistic regression (LR), with four 2D fingerprints, atom pairs, topological torsions, RDKit fingerprint, and circular fingerprint. The methods are compared using a previously developed benchmarking platform for 2D fingerprints which is extended to ML methods in this article. The original data sets are filtered for difficulty, and a new set of challenging data sets from ChEMBL is added. Data sets were also generated for a second use case: starting from a small set of related actives instead of diverse actives. The final fused model consistently outperforms the other approaches across the broad variety of targets studied, indicating that heterogeneous classifier fusion is a very promising approach for ligand-based VS. The new data sets together with the adapted source code for ML methods are provided in the Supporting Information .

摘要

数据融合的概念——将来自不同来源的描述同一对象的信息进行组合,以期生成更准确的表示——已经在非常广泛的学科领域得到了应用。在基于配体的虚拟筛选(VS)中,数据融合已经被应用于结合来自不同活性分子或不同指纹的知识,以提高相似性搜索性能。基于融合多个同类分类器的机器学习(ML)方法,尤其是随机森林,在 ML 文献中也得到了广泛应用。基于不同模型类型的预测融合的异类分类器融合——融合来自不同模型类型的预测——则较少被探索。在这里,我们使用三种不同的 ML 方法——随机森林(RF)、朴素贝叶斯(NB)和逻辑回归(LR)——结合四个 2D 指纹(原子对、拓扑扭转、RDKit 指纹和环形指纹),研究基于配体的 VS 的异类分类器融合。该方法使用以前开发的用于 2D 指纹的基准测试平台进行比较,并在本文中扩展到 ML 方法。原始数据集根据难度进行过滤,并添加了一组来自 ChEMBL 的新具有挑战性的数据集。数据集还被生成用于第二个用例:从一小部分相关的活性物质而不是多样化的活性物质开始。最终的融合模型在研究的广泛目标中始终优于其他方法,表明异类分类器融合是基于配体的 VS 的一种非常有前途的方法。新数据集以及适用于 ML 方法的改编源代码都在支持信息中提供。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验