Suppr超能文献

基于注意力机制神经网络的单细胞 RNA-seq 数据表型预测。

Phenotype prediction from single-cell RNA-seq data using attention-based neural networks.

机构信息

School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.

Department of Urologic Sciences, University of British Columbia, Vancouver BC V5Z 1M9, Canada.

出版信息

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae067.

Abstract

MOTIVATION

A patient's disease phenotype can be driven and determined by specific groups of cells whose marker genes are either unknown or can only be detected at late-stage using conventional bulk assays such as RNA-Seq technology. Recent advances in single-cell RNA sequencing (scRNA-seq) enable gene expression profiling in cell-level resolution, and therefore have the potential to identify those cells driving the disease phenotype even while the number of these cells is small. However, most existing methods rely heavily on accurate cell type detection, and the number of available annotated samples is usually too small for training deep learning predictive models.

RESULTS

Here, we propose the method ScRAT for phenotype prediction using scRNA-seq data. To train ScRAT with a limited number of samples of different phenotypes, such as coronavirus disease (COVID) and non-COVID, ScRAT first applies a mixup module to increase the number of training samples. A multi-head attention mechanism is employed to learn the most informative cells for each phenotype without relying on a given cell type annotation. Using three public COVID datasets, we show that ScRAT outperforms other phenotype prediction methods. The performance edge of ScRAT over its competitors increases as the number of training samples decreases, indicating the efficacy of our sample mixup. Critical cell types detected based on high-attention cells also support novel findings in the original papers and the recent literature. This suggests that ScRAT overcomes the challenge of missing marker genes and limited sample number with great potential revealing novel molecular mechanisms and/or therapies.

AVAILABILITY AND IMPLEMENTATION

The code of our proposed method ScRAT is published at https://github.com/yuzhenmao/ScRAT.

摘要

动机

患者的疾病表型可能由特定细胞群驱动和决定,这些细胞的标记基因要么未知,要么只能使用传统的批量分析(如 RNA-Seq 技术)在晚期检测到。单细胞 RNA 测序 (scRNA-seq) 的最新进展使得能够在细胞水平分辨率上进行基因表达谱分析,因此有可能识别出那些即使数量较少也能驱动疾病表型的细胞。然而,大多数现有方法严重依赖于准确的细胞类型检测,并且可用的注释样本数量通常太少,无法训练深度学习预测模型。

结果

在这里,我们提出了一种使用 scRNA-seq 数据进行表型预测的方法 ScRAT。为了使用有限数量的不同表型(如冠状病毒病 (COVID) 和非 COVID)的样本对 ScRAT 进行训练,ScRAT 首先应用混合模块来增加训练样本的数量。多头注意力机制用于学习对每个表型最有信息的细胞,而无需依赖给定的细胞类型注释。使用三个公共 COVID 数据集,我们表明 ScRAT 优于其他表型预测方法。随着训练样本数量的减少,ScRAT 相对于其竞争对手的性能优势增加,这表明我们的样本混合效果很好。基于高关注度细胞检测到的关键细胞类型也支持原始论文和最新文献中的新发现。这表明 ScRAT 克服了标记基因缺失和样本数量有限的挑战,具有揭示新的分子机制和/或疗法的巨大潜力。

可用性和实现

我们提出的方法 ScRAT 的代码发布在 https://github.com/yuzhenmao/ScRAT。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/581c/10902676/a3e4940de9dd/btae067f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验