Nicolet Benoît P, Jurgens Anouk P, Bresser Kaspar, Bradarić Antonia, Guislain Aurélie, Wolkers Monika C
T cell differentiation lab, Department of Research, Sanquin Blood Supply Foundation, Plesmanlaan 125, 1066 CX Amsterdam, Netherlands.
Landsteiner Laboratory, Amsterdam Institute for Infection & Immunity, Cancer Center Amsterdam, University of Amsterdam, Amsterdam UMC, Meibergdreef 9, 1105 AZ Amsterdam, Netherlands.
Sci Adv. 2025 Jul 25;11(30):eads0510. doi: 10.1126/sciadv.ads0510. Epub 2025 Jul 23.
Accurate protein expression in human immune cells is essential for appropriate cellular function. The mechanisms that define protein abundance are complex and are executed on transcriptional, posttranscriptional, and posttranslational levels. Here, we present SONAR, a machine learning pipeline that learns the endogenous sequence code and that defines protein abundance in human cells. SONAR uses thousands of sequence features (SFs) to predict up to 63% of the protein abundance independently of promoter or enhancer information. SONAR uncovered the cell type-specific and activation-dependent usage of SFs. The deep knowledge of SONAR provides a map of potentially biologically active SFs, which can be leveraged to manipulate the amplitude, timing, and cell type specificity of protein expression. SONAR informed on the design of enhancer sequences to boost T cell receptor expression and to potentiate T cell function. Beyond providing fundamental insights into the regulation of protein expression, our study thus offers innovative means to improve therapeutic and biotechnology applications.
人类免疫细胞中准确的蛋白质表达对于适当的细胞功能至关重要。定义蛋白质丰度的机制很复杂,在转录、转录后和翻译后水平上执行。在这里,我们展示了SONAR,这是一种机器学习管道,它学习内源性序列代码并定义人类细胞中的蛋白质丰度。SONAR使用数千个序列特征(SFs)来独立于启动子或增强子信息预测高达63%的蛋白质丰度。SONAR揭示了SFs的细胞类型特异性和激活依赖性使用情况。SONAR的深入知识提供了潜在生物活性SFs的图谱,可用于操纵蛋白质表达的幅度、时间和细胞类型特异性。SONAR为增强子序列的设计提供了信息,以提高T细胞受体表达并增强T细胞功能。因此,我们的研究不仅提供了对蛋白质表达调控的基本见解,还提供了改进治疗和生物技术应用的创新方法。