基于配体的虚拟筛选中的异类分类器融合：或者，委员会决策如何成为一件好事。

Heterogeneous classifier fusion for ligand-based virtual screening: or, how decision making by committee can be a good thing.

机构信息

Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, CH-4056 Basel, Switzerland.

出版信息

J Chem Inf Model. 2013 Nov 25;53(11):2829-36. doi: 10.1021/ci400466r. Epub 2013 Nov 14.

Abstract

The concept of data fusion - the combination of information from different sources describing the same object with the expectation to generate a more accurate representation - has found application in a very broad range of disciplines. In the context of ligand-based virtual screening (VS), data fusion has been applied to combine knowledge from either different active molecules or different fingerprints to improve similarity search performance. Machine-learning (ML) methods based on fusion of multiple homogeneous classifiers, in particular random forests, have also been widely applied in the ML literature. The heterogeneous version of classifier fusion - fusing the predictions from different model types - has been less explored. Here, we investigate heterogeneous classifier fusion for ligand-based VS using three different ML methods, RF, naïve Bayes (NB), and logistic regression (LR), with four 2D fingerprints, atom pairs, topological torsions, RDKit fingerprint, and circular fingerprint. The methods are compared using a previously developed benchmarking platform for 2D fingerprints which is extended to ML methods in this article. The original data sets are filtered for difficulty, and a new set of challenging data sets from ChEMBL is added. Data sets were also generated for a second use case: starting from a small set of related actives instead of diverse actives. The final fused model consistently outperforms the other approaches across the broad variety of targets studied, indicating that heterogeneous classifier fusion is a very promising approach for ligand-based VS. The new data sets together with the adapted source code for ML methods are provided in the Supporting Information .

摘要

数据融合的概念——将来自不同来源的描述同一对象的信息进行组合，以期生成更准确的表示——已经在非常广泛的学科领域得到了应用。在基于配体的虚拟筛选（VS）中，数据融合已经被应用于结合来自不同活性分子或不同指纹的知识，以提高相似性搜索性能。基于融合多个同类分类器的机器学习（ML）方法，尤其是随机森林，在 ML 文献中也得到了广泛应用。基于不同模型类型的预测融合的异类分类器融合——融合来自不同模型类型的预测——则较少被探索。在这里，我们使用三种不同的 ML 方法——随机森林（RF）、朴素贝叶斯（NB）和逻辑回归（LR）——结合四个 2D 指纹（原子对、拓扑扭转、RDKit 指纹和环形指纹），研究基于配体的 VS 的异类分类器融合。该方法使用以前开发的用于 2D 指纹的基准测试平台进行比较，并在本文中扩展到 ML 方法。原始数据集根据难度进行过滤，并添加了一组来自 ChEMBL 的新具有挑战性的数据集。数据集还被生成用于第二个用例：从一小部分相关的活性物质而不是多样化的活性物质开始。最终的融合模型在研究的广泛目标中始终优于其他方法，表明异类分类器融合是基于配体的 VS 的一种非常有前途的方法。新数据集以及适用于 ML 方法的改编源代码都在支持信息中提供。

相似文献

Heterogeneous classifier fusion for ligand-based virtual screening: or, how decision making by committee can be a good thing.

J Chem Inf Model. 2013 Nov 25;53(11):2829-36. doi: 10.1021/ci400466r. Epub 2013 Nov 14.

Using information from historical high-throughput screens to predict active compounds.

J Chem Inf Model. 2014 Jul 28;54(7):1880-91. doi: 10.1021/ci500190p. Epub 2014 Jun 26.

Virtual screening data fusion using both structure- and ligand-based methods.

J Chem Inf Model. 2012 Jan 23;52(1):225-32. doi: 10.1021/ci2004835. Epub 2011 Dec 22.

In silico target predictions: defining a benchmarking data set and comparison of performance of the multiclass Naïve Bayes and Parzen-Rosenblatt window.

J Chem Inf Model. 2013 Aug 26;53(8):1957-66. doi: 10.1021/ci300435j. Epub 2013 Jul 24.

SABRE: ligand/structure-based virtual screening approach using consensus molecular-shape pattern recognition.

J Chem Inf Model. 2014 Jan 27;54(1):338-46. doi: 10.1021/ci4005496. Epub 2013 Dec 23.

Ligand prediction from protein sequence and small molecule information using support vector machines and fingerprint descriptors.

J Chem Inf Model. 2009 Apr;49(4):767-79. doi: 10.1021/ci900004a.

Machine learning for improved pathological staging of prostate cancer: a performance comparison on a range of classifiers.

Artif Intell Med. 2012 May;55(1):25-35. doi: 10.1016/j.artmed.2011.11.003. Epub 2011 Dec 27.

Classification of cytochrome P450 inhibitors and noninhibitors using combined classifiers.

J Chem Inf Model. 2011 May 23;51(5):996-1011. doi: 10.1021/ci200028n. Epub 2011 Apr 14.

Development and validation of a novel protein-ligand fingerprint to mine chemogenomic space: application to G protein-coupled receptors and their ligands.

J Chem Inf Model. 2009 Apr;49(4):1049-62. doi: 10.1021/ci800447g.

Virtual drug screen schema based on multiview similarity integration and ranking aggregation.

J Chem Inf Model. 2012 Mar 26;52(3):834-43. doi: 10.1021/ci200481c. Epub 2012 Feb 29.

引用本文的文献

GESim: ultrafast graph-based molecular similarity calculation via von Neumann graph entropy.

J Cheminform. 2025 Apr 22;17(1):57. doi: 10.1186/s13321-025-01003-6.

Targeting Poly (ADP-ribose) polymerase-1 (PARP-1) for DNA repair mechanism through QSAR-based virtual screening and MD simulation.

Mol Divers. 2025 Apr 14. doi: 10.1007/s11030-025-11184-9.

Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow.

J Cheminform. 2024 Aug 16;16(1):101. doi: 10.1186/s13321-024-00894-1.

Redesigning plant specialized metabolism with supervised machine learning using publicly available reactome data.

Comput Struct Biotechnol J. 2023 Jan 18;21:1639-1650. doi: 10.1016/j.csbj.2023.01.013. eCollection 2023.

Large-Scale Distributed Training of Transformers for Chemical Fingerprinting.

J Chem Inf Model. 2022 Oct 24;62(20):4852-4862. doi: 10.1021/acs.jcim.2c00715. Epub 2022 Oct 4.

S2DV: converting SMILES to a drug vector for predicting the activity of anti-HBV small molecules.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab593.

Data-science based analysis of perceptual spaces of odors in olfactory loss.

Sci Rep. 2021 May 19;11(1):10595. doi: 10.1038/s41598-021-89969-9.

QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping.

J Cheminform. 2020 May 29;12(1):39. doi: 10.1186/s13321-020-00443-6.

Idea2Data: Toward a New Paradigm for Drug Discovery.

ACS Med Chem Lett. 2019 Feb 4;10(3):278-286. doi: 10.1021/acsmedchemlett.8b00488. eCollection 2019 Mar 14.

Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability.

J Cheminform. 2016 Oct 31;8:60. doi: 10.1186/s13321-016-0173-z. eCollection 2016.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于配体的虚拟筛选中的异类分类器融合：或者，委员会决策如何成为一件好事。

Heterogeneous classifier fusion for ligand-based virtual screening: or, how decision making by committee can be a good thing.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献