Suppr超能文献

scMUSCL:用于单细胞RNA测序数据聚类的多源迁移学习

scMUSCL: multi-source transfer learning for clustering scRNA-seq data.

作者信息

Khoeini Arash, Sar Funda, Lin Yen-Yi, Collins Colin, Ester Martin

机构信息

School of Computing Science, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada.

Vancouver Prostate Centre, Vancouver, British Columbia V6H 3Z6, Canada.

出版信息

Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf137.

Abstract

MOTIVATION

Single-cell RNA sequencing (scRNA-seq) analysis relies heavily on effective clustering to facilitate numerous downstream applications. Although several machine learning methods have been developed to enhance single-cell clustering, most are fully unsupervised and overlook the rich repository of annotated datasets available from previous single-cell experiments. Since cells are inherently high-dimensional entities, unsupervised clustering can often result in clusters that lack biological relevance. Leveraging annotated scRNA-seq datasets as a reference can significantly enhance clustering performance, enabling the identification of biologically meaningful clusters in target datasets.

RESULTS

In this article, we propose Single Cell MUlti-Source CLustering (scMUSCL), a novel transfer learning method designed to identify cell clusters in a target dataset by leveraging knowledge from multiple annotated reference datasets. scMUSCL employs a deep neural network to extract domain- and batch-invariant cell representations, effectively addressing discrepancies across various source datasets and between source and target datasets within the new representation space. Unlike existing methods, scMUSCL does not require prior knowledge of the number of clusters in the target dataset and eliminates the need for batch correction between source and target datasets. We conduct extensive experiments using 20 real-life datasets, demonstrating that scMUSCL consistently outperforms existing unsupervised and transfer learning-based methods. Furthermore, our experiments show that scMUSCL benefits from multiple source datasets as learning references and accurately estimates the number of clusters.

AVAILABILITY AND IMPLEMENTATION

The Python implementation of scMUSCL is available at https://github.com/arashkhoeini/scMUSCL.

摘要

动机

单细胞RNA测序(scRNA-seq)分析在很大程度上依赖于有效的聚类来促进众多下游应用。尽管已经开发了几种机器学习方法来增强单细胞聚类,但大多数方法都是完全无监督的,并且忽略了先前单细胞实验中可用的大量注释数据集。由于细胞本质上是高维实体,无监督聚类通常会导致缺乏生物学相关性的聚类。利用注释的scRNA-seq数据集作为参考可以显著提高聚类性能,从而在目标数据集中识别出具有生物学意义的聚类。

结果

在本文中,我们提出了单细胞多源聚类(scMUSCL),这是一种新颖的迁移学习方法,旨在通过利用来自多个注释参考数据集的知识来识别目标数据集中的细胞聚类。scMUSCL采用深度神经网络来提取域和批次不变的细胞表示,有效地解决了新表示空间中各种源数据集之间以及源数据集和目标数据集之间的差异。与现有方法不同,scMUSCL不需要事先知道目标数据集中的聚类数量,并且无需对源数据集和目标数据集进行批次校正。我们使用20个实际数据集进行了广泛的实验,证明scMUSCL始终优于现有的无监督和基于迁移学习的方法。此外,我们的实验表明,scMUSCL受益于多个源数据集作为学习参考,并能准确估计聚类数量。

可用性和实现

scMUSCL的Python实现可在https://github.com/arashkhoeini/scMUSCL上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab90/12065430/8a8bf5b59a58/btaf137f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验