基于机器学习的用于基于细胞筛选DNA编码文库的数据分析方法

Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries.

作者信息

Hou Rui, Xie Chao, Gui Yuhan, Li Gang, Li Xiaoyu

机构信息

Department of Chemistry and State Key Laboratory of Synthetic Chemistry, The University of Hong Kong, Hong Kong SAR, China.

Laboratory for Synthetic Chemistry and Chemical Biology LimitedHealth@InnoHK, Innovation and Technology Commission, Hong Kong SAR, China.

出版信息

ACS Omega. 2023 May 15;8(21):19057-19071. doi: 10.1021/acsomega.3c02152. eCollection 2023 May 30.

DOI:10.1021/acsomega.3c02152

PMID:37273617

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10233830/

Abstract

DNA-encoded library (DEL) is a powerful ligand discovery technology that has been widely adopted in the pharmaceutical industry. DEL selections are typically performed with a purified protein target immobilized on a matrix or in solution phase. Recently, DELs have also been used to interrogate the targets in the complex biological environment, such as membrane proteins on live cells. However, due to the complex landscape of the cell surface, the selection inevitably involves significant nonspecific interactions, and the selection data are much noisier than the ones with purified proteins, making reliable hit identification highly challenging. Researchers have developed several approaches to denoise DEL datasets, but it remains unclear whether they are suitable for cell-based DEL selections. Here, we report the proof-of-principle of a new machine-learning (ML)-based approach to process cell-based DEL selection datasets by using a Maximum A Posteriori (MAP) estimation loss function, a probabilistic framework that can account for and quantify uncertainties of noisy data. We applied the approach to a DEL selection dataset, where a library of 7,721,415 compounds was selected against a purified carbonic anhydrase 2 (CA-2) and a cell line expressing the membrane protein carbonic anhydrase 12 (CA-12). The extended-connectivity fingerprint (ECFP)-based regression model using the MAP loss function was able to identify true binders and also reliable structure-activity relationship (SAR) from the noisy cell-based selection datasets. In addition, the regularized enrichment metric (known as MAP enrichment) could also be calculated directly without involving the specific machine-learning model, effectively suppressing low-confidence outliers and enhancing the signal-to-noise ratio. Future applications of this method will focus on de novo ligand discovery from cell-based DEL selections.

摘要

DNA编码文库（DEL）是一种强大的配体发现技术，已在制药行业中广泛应用。DEL筛选通常是在固定于基质上或处于溶液相的纯化蛋白靶标上进行的。最近，DEL也已用于在复杂的生物环境中研究靶标，例如活细胞上的膜蛋白。然而，由于细胞表面的情况复杂，筛选不可避免地涉及大量非特异性相互作用，并且筛选数据比使用纯化蛋白时的数据噪声大得多，这使得可靠的命中识别极具挑战性。研究人员已经开发了几种方法来对DEL数据集进行去噪，但尚不清楚它们是否适用于基于细胞的DEL筛选。在此，我们报告了一种基于机器学习（ML）的新方法的原理证明，该方法通过使用最大后验（MAP）估计损失函数来处理基于细胞的DEL筛选数据集，这是一个可以解释和量化噪声数据不确定性的概率框架。我们将该方法应用于一个DEL筛选数据集，其中针对纯化的碳酸酐酶2（CA-2）和表达膜蛋白碳酸酐酶12（CA-12）的细胞系筛选了一个包含7,721,415种化合物的文库。使用MAP损失函数的基于扩展连接指纹（ECFP）的回归模型能够从基于细胞的嘈杂筛选数据集中识别出真正的结合物以及可靠的构效关系（SAR）。此外，正则化富集指标（称为MAP富集）也可以直接计算，而无需涉及特定的机器学习模型，从而有效地抑制低置信度的异常值并提高信噪比。该方法未来的应用将集中于从基于细胞的DEL筛选中进行全新配体发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2837/10233830/e9da82f4fd62/ao3c02152_0006.jpg

相似文献

Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries.基于机器学习的用于基于细胞筛选DNA编码文库的数据分析方法

ACS Omega. 2023 May 15;8(21):19057-19071. doi: 10.1021/acsomega.3c02152. eCollection 2023 May 30.

Evolution of the Selection Methods of DNA-Encoded Chemical Libraries.DNA 编码化学库的选择方法的演变。

Acc Chem Res. 2021 Sep 7;54(17):3491-3503. doi: 10.1021/acs.accounts.1c00375. Epub 2021 Aug 24.

Recent Advances on the Selection Methods of DNA-Encoded Libraries.DNA 编码文库的选择方法的最新进展。

Chembiochem. 2021 Jul 15;22(14):2384-2397. doi: 10.1002/cbic.202100144. Epub 2021 May 7.

Machine Learning on DNA-Encoded Library Count Data Using an Uncertainty-Aware Probabilistic Loss Function.基于不确定性感知概率损失函数的 DNA 编码库计数数据的机器学习。

J Chem Inf Model. 2022 May 23;62(10):2316-2331. doi: 10.1021/acs.jcim.2c00041. Epub 2022 May 10.

Denoising DNA Encoded Library Screens with Sparse Learning.基于稀疏学习的 DNA 编码文库筛选降噪。

ACS Comb Sci. 2020 Aug 10;22(8):410-421. doi: 10.1021/acscombsci.0c00007. Epub 2020 Jun 26.

Selection of DNA-encoded chemical libraries against endogenous membrane proteins on live cells.在活细胞上针对内源性膜蛋白的 DNA 编码化学文库的选择。

Nat Chem. 2021 Jan;13(1):77-88. doi: 10.1038/s41557-020-00605-x. Epub 2020 Dec 21.

Comparative evaluation of DNA-encoded chemical selections performed using DNA in single-stranded or double-stranded format.比较使用单链或双链 DNA 进行 DNA 编码化学选择的效果。

Biochem Biophys Res Commun. 2020 Dec 3;533(2):223-229. doi: 10.1016/j.bbrc.2020.04.035. Epub 2020 May 5.

Affinity Selections of DNA-Encoded Chemical Libraries on Carbonic Anhydrase IX-Expressing Tumor Cells Reveal a Dependence on Ligand Valence.DNA 编码化学文库对碳酸酐酶 IX 表达肿瘤细胞的亲和力选择揭示了配体电荷的依赖性。

Chemistry. 2021 Jun 21;27(35):8985-8993. doi: 10.1002/chem.202100816. Epub 2021 May 18.

Recent advances in DNA-encoded dynamic libraries.DNA编码动态文库的最新进展。

RSC Chem Biol. 2022 Feb 17;3(4):407-419. doi: 10.1039/d2cb00007e. eCollection 2022 Apr 6.

Converting Double-Stranded DNA-Encoded Libraries (DELs) to Single-Stranded Libraries for More Versatile Selections.将双链DNA编码文库（DELs）转化为单链文库以实现更通用的筛选。

ACS Omega. 2022 Mar 24;7(13):11491-11500. doi: 10.1021/acsomega.2c01152. eCollection 2022 Apr 5.

引用本文的文献

Solid-phase DNA-encoded library synthesis: a master builder's instructions.固相DNA编码文库合成：一位总建筑师的指南。

Nat Protoc. 2025 May 22. doi: 10.1038/s41596-025-01190-4.

Widespread false negatives in DNA-encoded library data: how linker effects impair machine learning-based lead prediction.DNA编码文库数据中广泛存在的假阴性：接头效应如何损害基于机器学习的先导化合物预测

Chem Sci. 2025 May 9. doi: 10.1039/d5sc00844a.

Open-Source DNA-Encoded Library Package for Design, Decoding and Analysis: DELi.用于设计、解码和分析的开源DNA编码文库软件包：DELi

bioRxiv. 2025 Mar 1:2025.02.25.640184. doi: 10.1101/2025.02.25.640184.

A Target Class Ligandability Evaluation of WD40 Repeat-Containing Proteins.含WD40重复序列蛋白的靶标类配体可及性评估

J Med Chem. 2025 Jan 23;68(2):1092-1112. doi: 10.1021/acs.jmedchem.4c02010. Epub 2024 Nov 4.

Encoding and display technologies for combinatorial libraries in drug discovery: The coming of age from biology to therapy.药物发现中组合文库的编码与展示技术：从生物学走向治疗的成熟之路。

Acta Pharm Sin B. 2024 Aug;14(8):3362-3384. doi: 10.1016/j.apsb.2024.04.006. Epub 2024 Apr 10.

Machine learning in preclinical drug discovery.机器学习在临床前药物发现中的应用。

Nat Chem Biol. 2024 Aug;20(8):960-973. doi: 10.1038/s41589-024-01679-1. Epub 2024 Jul 19.

Evolution of chemistry and selection technology for DNA-encoded library.DNA编码文库的化学与筛选技术的发展

Acta Pharm Sin B. 2024 Feb;14(2):492-516. doi: 10.1016/j.apsb.2023.10.001. Epub 2023 Oct 11.

本文引用的文献

DEL-Dock: Molecular Docking-Enabled Modeling of DNA-Encoded Libraries.DEL-Dock：基于分子对接的 DNA 编码文库建模。

J Chem Inf Model. 2023 May 8;63(9):2719-2727. doi: 10.1021/acs.jcim.2c01608. Epub 2023 Apr 20.

Machine Learning on DNA-Encoded Library Count Data Using an Uncertainty-Aware Probabilistic Loss Function.基于不确定性感知概率损失函数的 DNA 编码库计数数据的机器学习。

J Chem Inf Model. 2022 May 23;62(10):2316-2331. doi: 10.1021/acs.jcim.2c00041. Epub 2022 May 10.

Converting Double-Stranded DNA-Encoded Libraries (DELs) to Single-Stranded Libraries for More Versatile Selections.将双链DNA编码文库（DELs）转化为单链文库以实现更通用的筛选。

ACS Omega. 2022 Mar 24;7(13):11491-11500. doi: 10.1021/acsomega.2c01152. eCollection 2022 Apr 5.

Strategies for developing DNA-encoded libraries beyond binding assays.超越结合测定的 DNA 编码文库的开发策略。

Nat Chem. 2022 Feb;14(2):129-140. doi: 10.1038/s41557-021-00877-x. Epub 2022 Feb 4.

Regularization, Bayesian Inference, and Machine Learning Methods for Inverse Problems.逆问题的正则化、贝叶斯推理及机器学习方法

Entropy (Basel). 2021 Dec 13;23(12):1673. doi: 10.3390/e23121673.

Understanding Data Noise and Uncertainty through Analysis of Replicate Samples in DNA-Encoded Library Selection.通过分析 DNA 编码库筛选中的重复样本来理解数据噪声和不确定性。

J Chem Inf Model. 2022 May 9;62(9):2239-2247. doi: 10.1021/acs.jcim.1c00986. Epub 2021 Dec 4.

Antibacterial Discovery via Phenotypic DNA-Encoded Library Screening.基于表型 DNA 编码文库筛选的抗菌药物发现

ACS Chem Biol. 2021 Dec 17;16(12):2752-2756. doi: 10.1021/acschembio.1c00714. Epub 2021 Nov 20.

High-power screening (HPS) empowered by DNA-encoded libraries.基于 DNA 编码文库的高通量筛选 (HPS)。

Trends Pharmacol Sci. 2022 Jan;43(1):4-15. doi: 10.1016/j.tips.2021.10.008. Epub 2021 Nov 12.

A review on machine learning approaches and trends in drug discovery.关于药物发现中机器学习方法与趋势的综述。

Comput Struct Biotechnol J. 2021 Aug 12;19:4538-4558. doi: 10.1016/j.csbj.2021.08.011. eCollection 2021.

DNA-Encoded Chemical Libraries: A Comprehensive Review with Succesful Stories and Future Challenges.DNA编码化学文库：成功案例与未来挑战的全面综述

ACS Pharmacol Transl Sci. 2021 Jun 14;4(4):1265-1279. doi: 10.1021/acsptsci.1c00118. eCollection 2021 Aug 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于机器学习的用于基于细胞筛选DNA编码文库的数据分析方法

Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献