Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.
Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad342.
Cell-surface proteins play a critical role in cell function and are primary targets for therapeutics. CITE-seq is a single-cell technique that enables simultaneous measurement of gene and surface protein expression. It is powerful but costly and technically challenging. Computational methods have been developed to predict surface protein expression using gene expression information such as from single-cell RNA sequencing (scRNA-seq) data. Existing methods however are computationally demanding and lack the interpretability to reveal underlying biological processes. We propose CrossmodalNet, an interpretable machine learning model, to predict surface protein expression from scRNA-seq data. Our model with a customized adaptive loss accurately predicts surface protein abundances. When samples from multiple time points are given, our model encodes temporal information into an easy-to-interpret time embedding to make prediction in a time-point-specific manner, and is able to uncover noise-free causal gene-protein relationships. Using three publicly available time-resolved CITE-seq data sets, we validate the performance of our model by comparing it with benchmarking methods and evaluate its interpretability. Together, we show that our method accurately and interpretably profiles surface protein expression using scRNA-seq data, thereby expanding the capacity of CITE-seq experiments for investigating molecular mechanisms involving surface proteins.
细胞表面蛋白在细胞功能中起着关键作用,是治疗的主要靶点。CITE-seq 是一种单细胞技术,可实现基因和表面蛋白表达的同时测量。它功能强大,但成本高,技术要求也高。已经开发了计算方法来使用基因表达信息(如单细胞 RNA 测序 (scRNA-seq) 数据)预测表面蛋白表达。然而,现有的方法计算要求高,缺乏可解释性,无法揭示潜在的生物学过程。我们提出了 CrossmodalNet,这是一种可解释的机器学习模型,可从 scRNA-seq 数据中预测表面蛋白表达。我们的模型具有定制的自适应损失,可以准确预测表面蛋白丰度。当提供多个时间点的样本时,我们的模型将时间信息编码为易于解释的时间嵌入,以便以特定时间点的方式进行预测,并能够揭示无噪声的因果基因-蛋白关系。使用三个公开的时间分辨 CITE-seq 数据集,我们通过与基准方法进行比较来验证我们模型的性能,并评估其可解释性。总的来说,我们表明我们的方法可以使用 scRNA-seq 数据准确且可解释地描绘表面蛋白表达,从而扩展 CITE-seq 实验在研究涉及表面蛋白的分子机制方面的能力。