Suppr超能文献

一种可扩展的无监督学习 scRNAseq 数据检测稀有细胞通过结构保持嵌入、聚类和异常值检测的集成。

A scalable unsupervised learning of scRNAseq data detects rare cells through integration of structure-preserving embedding, clustering and outlier detection.

机构信息

Computer Science and Engineering, RCC Institute of Information Technology, Canal South Road, 700015, West Bengal, India.

Centre for Economy and Growth, Observer Research Foundation, Rouse Avenue, New Delhi, 110002, Delhi, India.

出版信息

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad125.

Abstract

Single-cell RNA-seq analysis has become a powerful tool to analyse the transcriptomes of individual cells. In turn, it has fostered the possibility of screening thousands of single cells in parallel. Thus, contrary to the traditional bulk measurements that only paint a macroscopic picture, gene measurements at the cell level aid researchers in studying different tissues and organs at various stages. However, accurate clustering methods for such high-dimensional data remain exiguous and a persistent challenge in this domain. Of late, several methods and techniques have been promulgated to address this issue. In this article, we propose a novel framework for clustering large-scale single-cell data and subsequently identifying the rare-cell sub-populations. To handle such sparse, high-dimensional data, we leverage PaCMAP (Pairwise Controlled Manifold Approximation), a feature extraction algorithm that preserves both the local and the global structures of the data and Gaussian Mixture Model to cluster single-cell data. Subsequently, we exploit Edited Nearest Neighbours sampling and Isolation Forest/One-class Support Vector Machine to identify rare-cell sub-populations. The performance of the proposed method is validated using the publicly available datasets with varying degrees of cell types and rare-cell sub-populations. On several benchmark datasets, the proposed method outperforms the existing state-of-the-art methods. The proposed method successfully identifies cell types that constitute populations ranging from 0.1 to 8% with F1-scores of 0.91 0.09. The source code is available at https://github.com/scrab017/RarPG.

摘要

单细胞 RNA 测序分析已成为分析单个细胞转录组的有力工具。反过来,它又促进了同时筛选数千个单细胞的可能性。因此,与仅描绘宏观图景的传统批量测量相反,细胞水平的基因测量有助于研究人员在不同阶段研究不同的组织和器官。然而,对于这种高维数据,准确的聚类方法仍然很少,并且仍然是该领域的一个持续挑战。最近,已经提出了几种方法和技术来解决这个问题。在本文中,我们提出了一种用于对大规模单细胞数据进行聚类并随后识别稀有细胞亚群的新框架。为了处理这种稀疏的高维数据,我们利用了 PaCMAP(成对控制流形逼近),这是一种特征提取算法,可以保留数据的局部和全局结构,以及高斯混合模型来对单细胞数据进行聚类。随后,我们利用 Edited Nearest Neighbours 采样和 Isolation Forest/One-class Support Vector Machine 来识别稀有细胞亚群。使用具有不同细胞类型和稀有细胞亚群程度的公共数据集验证了所提出方法的性能。在几个基准数据集上,所提出的方法优于现有的最先进的方法。该方法成功地识别了构成种群的细胞类型,其种群范围从 0.1%到 8%,F1 分数为 0.91 0.09。源代码可在 https://github.com/scrab017/RarPG 上获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验