Suppr超能文献

HArmonized single-cell RNA-seq Cell type Assisted Deconvolution (HASCAD). 协调单细胞 RNA-seq 细胞类型辅助去卷积 (HASCAD)。

HArmonized single-cell RNA-seq Cell type Assisted Deconvolution (HASCAD).

机构信息

Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, 112, Taiwan.

Department of Biomedical Engineering, Ming Chuan University, Taoyuan, 333, Taiwan.

出版信息

BMC Med Genomics. 2023 Oct 31;16(Suppl 2):272. doi: 10.1186/s12920-023-01674-w.

Abstract

BACKGROUND

Cell composition deconvolution (CCD) is a type of bioinformatic task to estimate the cell fractions from bulk gene expression profiles, such as RNA-seq. Many CCD models were developed to perform linear regression analysis using reference gene expression signatures of distinct cell types. Reference gene expression signatures could be generated from cell-specific gene expression profiles, such as scRNA-seq. However, the batch effects and dropout events frequently observed across scRNA-seq datasets have limited the performances of CCD methods.

METHODS

We developed a deep neural network (DNN) model, HASCAD, to predict the cell fractions of up to 15 immune cell types. HASCAD was trained using the bulk RNA-seq simulated from three scRNA-seq datasets that have been normalized by using a Harmony-Symphony based strategy. Mean square error and Pearson correlation coefficient were used to compare the performance of HASCAD with those of other widely used CCD methods. Two types of datasets, including a set of simulated bulk RNA-seq, and three human PBMC RNA-seq datasets, were arranged to conduct the benchmarks.

RESULTS

HASCAD is useful for the investigation of the impacts of immune cell heterogeneity on the therapeutic effects of immune checkpoint inhibitors, since the target cell types include the ones known to play a role in anti-tumor immunity, such as three subtypes of CD8 T cells and three subtypes of CD4 T cells. We found that the removal of batch effects in the reference scRNA-seq datasets could benefit the task of CCD. Our benchmarks showed that HASCAD is more suitable for analyzing bulk RNA-seq data, compared with the two widely used CCD methods, CIBERSORTx and quanTIseq. We applied HASCAD to analyze the liver cancer samples of TCGA-LIHC, and found that there were significant associations of the predicted abundance of Treg and effector CD8 T cell with patients' overall survival.

CONCLUSION

HASCAD could predict the cell composition of the PBMC bulk RNA-seq and classify the cell type from pure bulk RNA-seq. The model of HASCAD is available at https://github.com/holiday01/HASCAD .

摘要

背景

细胞成分去卷积(CCD)是一种从批量基因表达谱(如 RNA-seq)中估计细胞分数的生物信息学任务。许多 CCD 模型被开发出来,用于使用不同细胞类型的参考基因表达特征进行线性回归分析。参考基因表达特征可以从细胞特异性基因表达谱(如 scRNA-seq)中生成。然而,经常在 scRNA-seq 数据集中观察到的批次效应和丢包事件限制了 CCD 方法的性能。

方法

我们开发了一种深度神经网络(DNN)模型 HASCAD,用于预测多达 15 种免疫细胞类型的细胞分数。HASCAD 使用通过基于 Harmony-Symphony 的策略进行归一化的三个 scRNA-seq 数据集模拟的批量 RNA-seq 进行训练。均方误差和 Pearson 相关系数用于比较 HASCAD 与其他广泛使用的 CCD 方法的性能。安排了两种类型的数据集,包括一组模拟的批量 RNA-seq 和三个人类 PBMC RNA-seq 数据集,以进行基准测试。

结果

HASCAD 可用于研究免疫细胞异质性对免疫检查点抑制剂治疗效果的影响,因为目标细胞类型包括已知在抗肿瘤免疫中发挥作用的细胞类型,如三种 CD8 T 细胞亚型和三种 CD4 T 细胞亚型。我们发现,去除参考 scRNA-seq 数据集中的批次效应可以有益于 CCD 任务。我们的基准测试表明,与两种广泛使用的 CCD 方法 CIBERSORTx 和 quanTIseq 相比,HASCAD 更适合分析批量 RNA-seq 数据。我们将 HASCAD 应用于 TCGA-LIHC 的肝癌样本分析,发现 Treg 和效应 CD8 T 细胞的预测丰度与患者的总体生存率之间存在显著关联。

结论

HASCAD 可以预测 PBMC 批量 RNA-seq 的细胞组成,并从纯批量 RNA-seq 中对细胞类型进行分类。HASCAD 的模型可在 https://github.com/holiday01/HASCAD 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d37/10619225/d6eed1379895/12920_2023_1674_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验