利用主动学习高效发现蛋白质对化合物的响应。

Efficient discovery of responses of proteins to compounds using active learning.

机构信息

Lane Center for Computational Biology, Carnegie Mellon University, 5000 Forbes Ave,, Pittsburgh, PA 15213, USA.

出版信息

BMC Bioinformatics. 2014 May 16;15:143. doi: 10.1186/1471-2105-15-143.

DOI:10.1186/1471-2105-15-143

PMID:24884564

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4030446/

Abstract

BACKGROUND

Drug discovery and development has been aided by high throughput screening methods that detect compound effects on a single target. However, when using focused initial screening, undesirable secondary effects are often detected late in the development process after significant investment has been made. An alternative approach would be to screen against undesired effects early in the process, but the number of possible secondary targets makes this prohibitively expensive.

RESULTS

This paper describes methods for making this global approach practical by constructing predictive models for many target responses to many compounds and using them to guide experimentation. We demonstrate for the first time that by jointly modeling targets and compounds using descriptive features and using active machine learning methods, accurate models can be built by doing only a small fraction of possible experiments. The methods were evaluated by computational experiments using a dataset of 177 assays and 20,000 compounds constructed from the PubChem database.

CONCLUSIONS

An average of nearly 60% of all hits in the dataset were found after exploring only 3% of the experimental space which suggests that active learning can be used to enable more complete characterization of compound effects than otherwise affordable. The methods described are also likely to find widespread application outside drug discovery, such as for characterizing the effects of a large number of compounds or inhibitory RNAs on a large number of cell or tissue phenotypes.

摘要

背景

高通量筛选方法有助于药物发现和开发，这些方法可以检测化合物对单一靶标的影响。然而，在使用有针对性的初始筛选时，在已经进行了大量投资之后，往往会在开发过程的后期才发现不理想的次要影响。另一种方法是在早期的过程中针对不良影响进行筛选，但由于可能的次要靶标数量众多，这在经济上是不可行的。

结果

本文描述了通过构建对许多化合物对许多靶标反应的预测模型并使用它们来指导实验，从而使这种全局方法变得可行的方法。我们首次证明，通过使用描述性特征联合建模靶标和化合物，并使用主动机器学习方法，仅通过一小部分可能的实验就可以构建准确的模型。该方法通过使用从 PubChem 数据库构建的 177 个测定和 20000 个化合物的数据集进行计算实验进行了评估。

结论

在仅探索实验空间的 3%的情况下，数据集的近 60%的命中被发现，这表明主动学习可用于实现比其他方法更全面的化合物作用特征。所描述的方法也可能在药物发现之外得到广泛应用，例如对大量化合物或抑制性 RNA 对大量细胞或组织表型的作用进行特征描述。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5d8/4030446/b94e00ff1554/1471-2105-15-143-1.jpg

相似文献

Efficient discovery of responses of proteins to compounds using active learning.

BMC Bioinformatics. 2014 May 16;15:143. doi: 10.1186/1471-2105-15-143.

Artificial intelligence in virtual screening: Models versus experiments.

Drug Discov Today. 2022 Jul;27(7):1913-1923. doi: 10.1016/j.drudis.2022.05.013. Epub 2022 May 18.

Efficient modeling and active learning discovery of biological responses.

PLoS One. 2013 Dec 17;8(12):e83996. doi: 10.1371/journal.pone.0083996. eCollection 2013.

From machine learning to deep learning: progress in machine intelligence for rational drug discovery.

Drug Discov Today. 2017 Nov;22(11):1680-1685. doi: 10.1016/j.drudis.2017.08.010. Epub 2017 Sep 4.

CAPi: Computational Model for Apicoplast Inhibitors Prediction Against Plasmodium Parasite.

Curr Comput Aided Drug Des. 2017 Nov 10;13(4):303-310. doi: 10.2174/1573409913666170301121110.

Active machine learning-driven experimentation to determine compound effects on protein patterns.

Elife. 2016 Feb 3;5:e10047. doi: 10.7554/eLife.10047.

Deciding when to stop: efficient experimentation to learn to predict drug-target interactions.

BMC Bioinformatics. 2015 Jul 9;16:213. doi: 10.1186/s12859-015-0650-9.

Using information from historical high-throughput screens to predict active compounds.

J Chem Inf Model. 2014 Jul 28;54(7):1880-91. doi: 10.1021/ci500190p. Epub 2014 Jun 26.

Deep Learning-Based Imbalanced Data Classification for Drug Discovery.

J Chem Inf Model. 2020 Sep 28;60(9):4180-4190. doi: 10.1021/acs.jcim.9b01162. Epub 2020 Jul 8.

BonMOLière: Small-Sized Libraries of Readily Purchasable Compounds, Optimized to Produce Genuine Hits in Biological Screens across the Protein Space.

Int J Mol Sci. 2021 Jul 21;22(15):7773. doi: 10.3390/ijms22157773.

引用本文的文献

Advancing genetic engineering with active learning: theory, implementations and potential opportunities.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf286.

A Comprehensive Investigation of Active Learning Strategies for Conducting Anti-Cancer Drug Screening.

Cancers (Basel). 2024 Jan 26;16(3):530. doi: 10.3390/cancers16030530.

Active Semisupervised Model for Improving the Identification of Anticancer Peptides.

ACS Omega. 2021 Sep 8;6(37):23998-24008. doi: 10.1021/acsomega.1c03132. eCollection 2021 Sep 21.

Active learning effectively identifies a minimal set of maximally informative and asymptotically performant cytotoxic structure-activity patterns in NCI-60 cell lines.

RSC Med Chem. 2020 Jul 20;11(9):1075-1087. doi: 10.1039/d0md00110d. eCollection 2020 Sep 1.

Active semi-supervised learning for biological data classification.

PLoS One. 2020 Aug 19;15(8):e0237428. doi: 10.1371/journal.pone.0237428. eCollection 2020.

Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines.

BMC Bioinformatics. 2016 Dec 7;17(1):520. doi: 10.1186/s12859-016-1392-z.

Improving drug discovery with high-content phenotypic screens by systematic selection of reporter cell lines.

Nat Biotechnol. 2016 Jan;34(1):70-77. doi: 10.1038/nbt.3419. Epub 2015 Dec 14.

Deciding when to stop: efficient experimentation to learn to predict drug-target interactions.

BMC Bioinformatics. 2015 Jul 9;16:213. doi: 10.1186/s12859-015-0650-9.

本文引用的文献

Efficient modeling and active learning discovery of biological responses.

PLoS One. 2013 Dec 17;8(12):e83996. doi: 10.1371/journal.pone.0083996. eCollection 2013.

Automated design of ligands to polypharmacological profiles.

Nature. 2012 Dec 13;492(7428):215-20. doi: 10.1038/nature11691.

Associating Drugs, Targets and Clinical Outcomes into an Integrated Network Affords a New Platform for Computer-Aided Drug Repurposing.

Mol Inform. 2011 Mar 14;30(2-3):100-111. doi: 10.1002/minf.201100023.

An active role for machine learning in drug development.

Nat Chem Biol. 2011 Jun;7(6):327-30. doi: 10.1038/nchembio.576.

Advances and challenges in protein-ligand docking.

Int J Mol Sci. 2010 Aug 18;11(8):3016-34. doi: 10.3390/ijms11083016.

Drug profiling: knowing where it hits.

Drug Discov Today. 2010 Sep;15(17-18):749-56. doi: 10.1016/j.drudis.2010.06.006. Epub 2010 Jun 18.

Active learning for human protein-protein interaction prediction.

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S57. doi: 10.1186/1471-2105-11-S1-S57.

Automated discovery of novel drug formulations using predictive iterated high throughput experimentation.

PLoS One. 2010 Jan 1;5(1):e8546. doi: 10.1371/journal.pone.0008546.

Predicting new molecular targets for known drugs.

Nature. 2009 Nov 12;462(7270):175-81. doi: 10.1038/nature08506. Epub 2009 Nov 1.

A novel method for mining highly imbalanced high-throughput screening data in PubChem.

Bioinformatics. 2009 Dec 15;25(24):3310-6. doi: 10.1093/bioinformatics/btp589. Epub 2009 Oct 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用主动学习高效发现蛋白质对化合物的响应。

Efficient discovery of responses of proteins to compounds using active learning.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献