Suppr超能文献

基于不确定性感知概率损失函数的 DNA 编码库计数数据的机器学习。

Machine Learning on DNA-Encoded Library Count Data Using an Uncertainty-Aware Probabilistic Loss Function.

机构信息

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.

Department of Biology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.

出版信息

J Chem Inf Model. 2022 May 23;62(10):2316-2331. doi: 10.1021/acs.jcim.2c00041. Epub 2022 May 10.

Abstract

DNA-encoded library (DEL) screening and quantitative structure-activity relationship (QSAR) modeling are two techniques used in drug discovery to find novel small molecules that bind a protein target. Applying QSAR modeling to DEL selection data can facilitate the selection of compounds for off-DNA synthesis and evaluation. Such a combined approach has been done recently by training binary classifiers to learn DEL enrichments of aggregated "disynthons" in order to accommodate the sparse and noisy nature of DEL data. However, a binary classification model cannot distinguish between different levels of enrichment, and information is potentially lost during disynthon aggregation. Here, we demonstrate a regression approach to learning DEL enrichments of individual molecules, using a custom negative-log-likelihood loss function that effectively denoises DEL data and introduces opportunities for visualization of learned structure-activity relationships. Our approach explicitly models the Poisson statistics of the sequencing process used in the DEL experimental workflow under a frequentist view. We illustrate this approach on a DEL dataset of 108,528 compounds screened against carbonic anhydrase (CAIX), and a dataset of 5,655,000 compounds screened against soluble epoxide hydrolase (sEH) and SIRT2. Due to the treatment of uncertainty in the data through the negative-log-likelihood loss used during training, the models can ignore low-confidence outliers. While our approach does not demonstrate a benefit for extrapolation to novel structures, we expect our denoising and visualization pipeline to be useful in identifying structure-activity trends and highly enriched pharmacophores in DEL data. Further, this approach to uncertainty-aware regression modeling is applicable to other sparse or noisy datasets where the nature of stochasticity is known or can be modeled; in particular, the Poisson enrichment ratio metric we use can apply to other settings that compare sequencing count data between two experimental conditions.

摘要

DNA 编码文库 (DEL) 筛选和定量构效关系 (QSAR) 建模是药物发现中用于寻找与蛋白质靶标结合的新型小分子的两种技术。将 QSAR 建模应用于 DEL 选择数据可以促进用于非 DNA 合成和评估的化合物的选择。最近,人们通过训练二进制分类器来学习聚集“disynthons”的 DEL 富集,以适应 DEL 数据的稀疏性和噪声特性,从而完成了这种组合方法。然而,二进制分类模型不能区分不同水平的富集,并且在 disynthon 聚集过程中可能会丢失信息。在这里,我们展示了一种使用定制负对数似然损失函数学习单个分子 DEL 富集的回归方法,该方法有效地对 DEL 数据进行去噪,并为可视化学习的结构-活性关系提供了机会。我们的方法在一个频繁主义观点下,明确地对 DEL 实验工作流程中使用的测序过程的泊松统计进行建模。我们在针对碳酸酐酶 (CAIX) 筛选的 108,528 种化合物的 DEL 数据集和针对可溶性环氧合酶 (sEH) 和 SIRT2 筛选的 5,655,000 种化合物的数据集上说明了这种方法。由于在训练过程中使用的负对数似然损失来处理数据中的不确定性,因此模型可以忽略低置信度的异常值。虽然我们的方法在向新结构外推方面没有显示出优势,但我们希望我们的去噪和可视化管道能够用于识别 DEL 数据中的结构-活性趋势和高度富集的药效团。此外,这种对不确定性感知回归建模的方法适用于其他稀疏或噪声数据集,其中随机性的性质是已知的或可以建模的;特别是,我们使用的泊松富集比度量可以应用于其他需要比较两种实验条件下测序计数数据的设置。

相似文献

2
Building Block-Based Binding Predictions for DNA-Encoded Libraries.基于积木的 DNA 编码文库结合预测。
J Chem Inf Model. 2023 Aug 28;63(16):5120-5132. doi: 10.1021/acs.jcim.3c00588. Epub 2023 Aug 14.
5
Compositional Deep Probabilistic Models of DNA-Encoded Libraries.DNA 编码文库的组成深度概率模型。
J Chem Inf Model. 2024 Feb 26;64(4):1123-1133. doi: 10.1021/acs.jcim.3c01699. Epub 2024 Feb 9.
6
Denoising DNA Encoded Library Screens with Sparse Learning.基于稀疏学习的 DNA 编码文库筛选降噪。
ACS Comb Sci. 2020 Aug 10;22(8):410-421. doi: 10.1021/acscombsci.0c00007. Epub 2020 Jun 26.
7
9
A method for estimating binding affinity from primary DEL selection data.从原始 DEL 选择数据估算结合亲和力的方法。
Biochem Biophys Res Commun. 2020 Dec 3;533(2):249-255. doi: 10.1016/j.bbrc.2020.04.029. Epub 2020 May 19.

引用本文的文献

7
Machine learning in preclinical drug discovery.机器学习在临床前药物发现中的应用。
Nat Chem Biol. 2024 Aug;20(8):960-973. doi: 10.1038/s41589-024-01679-1. Epub 2024 Jul 19.
8
Evolution of chemistry and selection technology for DNA-encoded library.DNA编码文库的化学与筛选技术的发展
Acta Pharm Sin B. 2024 Feb;14(2):492-516. doi: 10.1016/j.apsb.2023.10.001. Epub 2023 Oct 11.

本文引用的文献

6
QSAR without borders.无边界定量构效关系。
Chem Soc Rev. 2020 Jun 7;49(11):3525-3564. doi: 10.1039/d0cs00098a. Epub 2020 May 1.
8
Analyzing Learned Molecular Representations for Property Prediction.分析用于性质预测的学习分子表示。
J Chem Inf Model. 2019 Aug 26;59(8):3370-3388. doi: 10.1021/acs.jcim.9b00237. Epub 2019 Aug 13.
10
DNA Barcoding a Complete Matrix of Stereoisomeric Small Molecules.DNA 条码全矩阵的立体异构小分子。
J Am Chem Soc. 2019 Jul 3;141(26):10225-10235. doi: 10.1021/jacs.9b01203. Epub 2019 Jun 25.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验