基于注意力机制神经网络的单细胞 RNA-seq 数据表型预测。

Phenotype prediction from single-cell RNA-seq data using attention-based neural networks.

机构信息

School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.

Department of Urologic Sciences, University of British Columbia, Vancouver BC V5Z 1M9, Canada.

出版信息

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae067.

DOI:10.1093/bioinformatics/btae067

PMID:38390963

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10902676/

Abstract

MOTIVATION

A patient's disease phenotype can be driven and determined by specific groups of cells whose marker genes are either unknown or can only be detected at late-stage using conventional bulk assays such as RNA-Seq technology. Recent advances in single-cell RNA sequencing (scRNA-seq) enable gene expression profiling in cell-level resolution, and therefore have the potential to identify those cells driving the disease phenotype even while the number of these cells is small. However, most existing methods rely heavily on accurate cell type detection, and the number of available annotated samples is usually too small for training deep learning predictive models.

RESULTS

Here, we propose the method ScRAT for phenotype prediction using scRNA-seq data. To train ScRAT with a limited number of samples of different phenotypes, such as coronavirus disease (COVID) and non-COVID, ScRAT first applies a mixup module to increase the number of training samples. A multi-head attention mechanism is employed to learn the most informative cells for each phenotype without relying on a given cell type annotation. Using three public COVID datasets, we show that ScRAT outperforms other phenotype prediction methods. The performance edge of ScRAT over its competitors increases as the number of training samples decreases, indicating the efficacy of our sample mixup. Critical cell types detected based on high-attention cells also support novel findings in the original papers and the recent literature. This suggests that ScRAT overcomes the challenge of missing marker genes and limited sample number with great potential revealing novel molecular mechanisms and/or therapies.

AVAILABILITY AND IMPLEMENTATION

The code of our proposed method ScRAT is published at https://github.com/yuzhenmao/ScRAT.

摘要

动机

患者的疾病表型可能由特定细胞群驱动和决定，这些细胞的标记基因要么未知，要么只能使用传统的批量分析（如 RNA-Seq 技术）在晚期检测到。单细胞 RNA 测序 (scRNA-seq) 的最新进展使得能够在细胞水平分辨率上进行基因表达谱分析，因此有可能识别出那些即使数量较少也能驱动疾病表型的细胞。然而，大多数现有方法严重依赖于准确的细胞类型检测，并且可用的注释样本数量通常太少，无法训练深度学习预测模型。

结果

在这里，我们提出了一种使用 scRNA-seq 数据进行表型预测的方法 ScRAT。为了使用有限数量的不同表型（如冠状病毒病 (COVID) 和非 COVID）的样本对 ScRAT 进行训练，ScRAT 首先应用混合模块来增加训练样本的数量。多头注意力机制用于学习对每个表型最有信息的细胞，而无需依赖给定的细胞类型注释。使用三个公共 COVID 数据集，我们表明 ScRAT 优于其他表型预测方法。随着训练样本数量的减少，ScRAT 相对于其竞争对手的性能优势增加，这表明我们的样本混合效果很好。基于高关注度细胞检测到的关键细胞类型也支持原始论文和最新文献中的新发现。这表明 ScRAT 克服了标记基因缺失和样本数量有限的挑战，具有揭示新的分子机制和/或疗法的巨大潜力。

可用性和实现

我们提出的方法 ScRAT 的代码发布在 https://github.com/yuzhenmao/ScRAT。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/581c/10902676/a3e4940de9dd/btae067f1.jpg

相似文献

Phenotype prediction from single-cell RNA-seq data using attention-based neural networks.基于注意力机制神经网络的单细胞 RNA-seq 数据表型预测。

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae067.

ProtoCell4P: an explainable prototype-based neural network for patient classification using single-cell RNA-seq.ProtoCell4P：一种基于原型的可解释神经网络，用于使用单细胞 RNA-seq 进行患者分类。

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad493.

Deep enhanced constraint clustering based on contrastive learning for scRNA-seq data.基于对比学习的深度增强约束聚类算法在单细胞 RNA-seq 数据分析中的应用。

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad222.

Deep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network.基于自动编码器和图神经网络的单细胞 RNA-seq 数据深度结构聚类。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac018.

CellVGAE: an unsupervised scRNA-seq analysis workflow with graph attention networks.CellVGAE：一种基于图注意网络的无监督 scRNA-seq 分析工作流程。

Bioinformatics. 2022 Feb 7;38(5):1277-1286. doi: 10.1093/bioinformatics/btab804.

A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data.基于机器学习的方法，用于自动识别注释单细胞 RNA-seq 数据中的新型细胞。

Bioinformatics. 2022 Oct 31;38(21):4885-4892. doi: 10.1093/bioinformatics/btac617.

TripletCell: a deep metric learning framework for accurate annotation of cell types at the single-cell level.三重细胞：一种用于单细胞水平准确注释细胞类型的深度度量学习框架。

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad132.

scGCL: an imputation method for scRNA-seq data based on graph contrastive learning.scGCL：一种基于图对比学习的 scRNA-seq 数据插补方法。

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad098.

scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception.scEMAIL：一种具有新型细胞感知能力的 scRNA-seq 数据的通用且无来源注释方法。

Genomics Proteomics Bioinformatics. 2022 Oct;20(5):939-958. doi: 10.1016/j.gpb.2022.12.008. Epub 2023 Jan 3.

CL-Impute: A contrastive learning-based imputation for dropout single-cell RNA-seq data.CL-Impute：基于对比学习的 dropout 单细胞 RNA-seq 数据插补方法。

Comput Biol Med. 2023 Sep;164:107263. doi: 10.1016/j.compbiomed.2023.107263. Epub 2023 Jul 23.

引用本文的文献

TissueFormer: a neural network for labeling tissue from grouped single-cell RNA profiles.组织生成器：一种用于从分组单细胞RNA图谱标记组织的神经网络。

bioRxiv. 2025 Aug 19:2025.08.17.670735. doi: 10.1101/2025.08.17.670735.

Exploring machine learning strategies for single-cell transcriptomic analysis in wound healing.探索用于伤口愈合单细胞转录组分析的机器学习策略。

Burns Trauma. 2025 May 13;13:tkaf032. doi: 10.1093/burnst/tkaf032. eCollection 2025.

Incorporating hierarchical information into multiple instance learning for patient phenotype prediction with single-cell RNA-sequencing data.将层次信息整合到多实例学习中，用于利用单细胞RNA测序数据预测患者表型。

Bioinformatics. 2025 Jul 1;41(Supplement_1):i96-i104. doi: 10.1093/bioinformatics/btaf241.

cytoGPNet: Enhancing Clinical Outcome Prediction Accuracy Using Longitudinal Cytometry Data in Small Cohort Studies.细胞GP网络：在小队列研究中利用纵向细胞计数数据提高临床结果预测准确性。

bioRxiv. 2025 May 7:2025.05.01.651729. doi: 10.1101/2025.05.01.651729.

Uncovering gene and cellular signatures of immune checkpoint response via machine learning and single-cell RNA-seq.通过机器学习和单细胞RNA测序揭示免疫检查点反应的基因和细胞特征。

NPJ Precis Oncol. 2025 Apr 2;9(1):95. doi: 10.1038/s41698-025-00883-z.

Challenges in AI-driven Biomedical Multimodal Data Fusion and Analysis.人工智能驱动的生物医学多模态数据融合与分析中的挑战。

Genomics Proteomics Bioinformatics. 2025 May 10;23(1). doi: 10.1093/gpbjnl/qzaf011.

Bioinformatics and molecular biology tools for diagnosis, prevention, treatment and prognosis of COVID-19.用于COVID-19诊断、预防、治疗和预后的生物信息学与分子生物学工具

Heliyon. 2024 Jul 11;10(14):e34393. doi: 10.1016/j.heliyon.2024.e34393. eCollection 2024 Jul 30.

Explainable deep neural networks for predicting sample phenotypes from single-cell transcriptomics.用于从单细胞转录组学预测样本表型的可解释深度神经网络。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae673.

Differential expression and co-expression reveal cell types relevant to genetic disorder phenotypes.差异表达和共表达揭示与遗传疾病表型相关的细胞类型。

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae646.

Advancing bioinformatics with large language models: components, applications and perspectives.利用大语言模型推进生物信息学：组件、应用与展望

ArXiv. 2025 Jan 31:arXiv:2401.04155v2.

本文引用的文献

MuSiC2: cell-type deconvolution for multi-condition bulk RNA-seq data.MuSiC2：用于多条件批量 RNA-seq 数据的细胞类型去卷积。

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac430.

COVID-19 and plasma cells: Is there long-lived protection?COVID-19 与浆细胞：是否存在长期保护？

Immunol Rev. 2022 Aug;309(1):40-63. doi: 10.1111/imr.13115. Epub 2022 Jul 8.

LRcell: detecting the source of differential expression at the sub-cell-type level from bulk RNA-seq data.LRcell：从批量 RNA-seq 数据中检测亚细胞类型水平差异表达的来源。

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac063.

A blood atlas of COVID-19 defines hallmarks of disease severity and specificity.COVID-19 血液图谱定义了疾病严重程度和特异性的特征。

Cell. 2022 Mar 3;185(5):916-938.e58. doi: 10.1016/j.cell.2022.01.012. Epub 2022 Jan 21.

Moving pan-cancer studies from basic research toward the clinic.将泛癌研究从基础研究推向临床。

Nat Cancer. 2021 Sep;2(9):879-890. doi: 10.1038/s43018-021-00250-4. Epub 2021 Sep 16.

scIAE: an integrative autoencoder-based ensemble classification framework for single-cell RNA-seq data.scIAE：一种基于集成自动编码器的单细胞 RNA-seq 数据综合分类框架。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab508.

CloudPred: Predicting Patient Phenotypes From Single-cell RNA-seq.云预测：从单细胞 RNA-seq 预测患者表型。

Pac Symp Biocomput. 2022;27:337-348.

Impaired function and delayed regeneration of dendritic cells in COVID-19.COVID-19 导致树突状细胞功能障碍和再生延迟。

PLoS Pathog. 2021 Oct 6;17(10):e1009742. doi: 10.1371/journal.ppat.1009742. eCollection 2021 Oct.

Phenotypic signatures in clinical data enable systematic identification of patients for genetic testing.临床数据中的表型特征可用于系统地识别需要进行基因检测的患者。

Nat Med. 2021 Jun;27(6):1097-1104. doi: 10.1038/s41591-021-01356-z. Epub 2021 Jun 3.

Single-cell multi-omics analysis of the immune response in COVID-19.单细胞多组学分析 COVID-19 中的免疫反应。

Nat Med. 2021 May;27(5):904-916. doi: 10.1038/s41591-021-01329-2. Epub 2021 Apr 20.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于注意力机制神经网络的单细胞 RNA-seq 数据表型预测。

Phenotype prediction from single-cell RNA-seq data using attention-based neural networks.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献