Suppr超能文献

scTPC:一种用于 scRNA-seq 数据的新型半监督深度聚类模型。

scTPC: a novel semisupervised deep clustering model for scRNA-seq data.

机构信息

School of Mathematical Sciences, Shenzhen University, Shenzhen, Guangdong 518000, China.

School of Mathematics, Renmin University of China, Haidian District, Beijing 100872, China.

出版信息

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae293.

Abstract

MOTIVATION

Continuous advancements in single-cell RNA sequencing (scRNA-seq) technology have enabled researchers to further explore the study of cell heterogeneity, trajectory inference, identification of rare cell types, and neurology. Accurate scRNA-seq data clustering is crucial in single-cell sequencing data analysis. However, the high dimensionality, sparsity, and presence of "false" zero values in the data can pose challenges to clustering. Furthermore, current unsupervised clustering algorithms have not effectively leveraged prior biological knowledge, making cell clustering even more challenging.

RESULTS

This study investigates a semisupervised clustering model called scTPC, which integrates the triplet constraint, pairwise constraint, and cross-entropy constraint based on deep learning. Specifically, the model begins by pretraining a denoising autoencoder based on a zero-inflated negative binomial distribution. Deep clustering is then performed in the learned latent feature space using triplet constraints and pairwise constraints generated from partial labeled cells. Finally, to address imbalanced cell-type datasets, a weighted cross-entropy loss is introduced to optimize the model. A series of experimental results on 10 real scRNA-seq datasets and five simulated datasets demonstrate that scTPC achieves accurate clustering with a well-designed framework.

AVAILABILITY AND IMPLEMENTATION

scTPC is a Python-based algorithm, and the code is available from https://github.com/LF-Yang/Code or https://zenodo.org/records/10951780.

摘要

动机

单细胞 RNA 测序 (scRNA-seq) 技术的不断进步,使研究人员能够进一步探索细胞异质性、轨迹推断、稀有细胞类型的鉴定和神经科学的研究。准确的 scRNA-seq 数据聚类在单细胞测序数据分析中至关重要。然而,数据的高维性、稀疏性和“假”零值的存在给聚类带来了挑战。此外,当前的无监督聚类算法尚未有效利用先验生物学知识,使得细胞聚类更加困难。

结果

本研究调查了一种称为 scTPC 的半监督聚类模型,该模型基于深度学习整合了三元组约束、成对约束和交叉熵约束。具体来说,该模型首先基于零膨胀负二项分布预训练去噪自动编码器。然后,在学习到的潜在特征空间中使用来自部分标记细胞的三元组约束和成对约束进行深度聚类。最后,为了解决细胞类型数据集不平衡的问题,引入加权交叉熵损失来优化模型。在 10 个真实的 scRNA-seq 数据集和 5 个模拟数据集上进行的一系列实验结果表明,scTPC 实现了基于精心设计框架的准确聚类。

可用性和实现

scTPC 是一个基于 Python 的算法,代码可在 https://github.com/LF-Yang/Codehttps://zenodo.org/records/10951780 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ffa/11091743/94678ac05fe2/btae293f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验