Suppr超能文献

机器学习驱动的数据估值优化高通量筛选管道。

Machine Learning-Driven Data Valuation for Optimizing High-Throughput Screening Pipelines.

机构信息

Technical University of Munich, TUM School of Natural Sciences, Department of Bioscience, Center for Functional Protein Assemblies (CPA), 85748 Garching bei München, Germany.

出版信息

J Chem Inf Model. 2024 Nov 11;64(21):8142-8152. doi: 10.1021/acs.jcim.4c01547. Epub 2024 Oct 23.

Abstract

In the rapidly evolving field of drug discovery, high-throughput screening (HTS) is essential for identifying bioactive compounds. This study introduces a novel application of data valuation, a concept for evaluating the importance of data points based on their impact, to enhance drug discovery pipelines. Our approach improves active learning for compound library screening, robustly identifies true and false positives in HTS data, and identifies important inactive samples in an imbalanced HTS training, all while accounting for computational efficiency. We demonstrate that importance-based methods enable more effective batch screening, reducing the need for extensive HTS. Machine learning models accurately differentiate true biological activity from assay artifacts, streamlining the drug discovery process. Additionally, importance undersampling aids in HTS data set balancing, improving machine learning performance without omitting crucial inactive samples. These advancements could significantly enhance the efficiency and accuracy of drug development.

摘要

在药物发现这个快速发展的领域,高通量筛选(HTS)对于识别生物活性化合物至关重要。本研究介绍了一种新颖的数据估值应用,该方法基于数据点的影响来评估其重要性,以增强药物发现管道。我们的方法改进了化合物库筛选的主动学习,在 HTS 数据中稳健地识别真实和假阳性,并识别不平衡 HTS 训练中的重要无活性样本,同时考虑计算效率。我们证明基于重要性的方法可以更有效地进行批量筛选,减少对广泛 HTS 的需求。机器学习模型可以准确地区分真实的生物学活性和分析物伪迹,从而简化药物发现过程。此外,重要性欠采样有助于 HTS 数据集平衡,在不忽略关键无活性样本的情况下提高机器学习性能。这些进展可以显著提高药物开发的效率和准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ddf/11558681/f9f6f5331cba/ci4c01547_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验