通过集成相似性学习实现准确的单细胞聚类。

Accurate Single-Cell Clustering through Ensemble Similarity Learning.

机构信息

Department of Mechatronics Engineering, Incheon National University, Incheon 22012, Korea.

Department of Mechanical Engineering, Dong-A University, Busan 49315, Korea.

出版信息

Genes (Basel). 2021 Oct 22;12(11):1670. doi: 10.3390/genes12111670.

DOI:10.3390/genes12111670

PMID:34828276

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8623803/

Abstract

Single-cell sequencing provides novel means to interpret the transcriptomic profiles of individual cells. To obtain in-depth analysis of single-cell sequencing, it requires effective computational methods to accurately predict single-cell clusters because single-cell sequencing techniques only provide the transcriptomic profiles of each cell. Although an accurate estimation of the cell-to-cell similarity is an essential first step to derive reliable single-cell clustering results, it is challenging to obtain the accurate similarity measurement because it highly depends on a selection of genes for similarity evaluations and the optimal set of genes for the accurate similarity estimation is typically unknown. Moreover, due to technical limitations, single-cell sequencing includes a larger number of artificial zeros, and the technical noise makes it difficult to develop effective single-cell clustering algorithms. Here, we describe a novel single-cell clustering algorithm that can accurately predict single-cell clusters in large-scale single-cell sequencing by effectively reducing the zero-inflated noise and accurately estimating the cell-to-cell similarities. First, we construct an ensemble similarity network based on different similarity estimates, and reduce the artificial noise using a random walk with restart framework. Finally, starting from a larger number small size but highly consistent clusters, we iteratively merge a pair of clusters with the maximum similarities until it reaches the predicted number of clusters. Extensive performance evaluation shows that the proposed single-cell clustering algorithm can yield the accurate single-cell clustering results and it can help deciphering the key messages underlying complex biological mechanisms.

摘要

单细胞测序为解析单个细胞的转录组图谱提供了新方法。为了深入分析单细胞测序，需要有效的计算方法来准确预测单细胞聚类，因为单细胞测序技术仅提供每个细胞的转录组图谱。虽然准确估计细胞间的相似性是获得可靠单细胞聚类结果的重要第一步，但由于相似性评估的基因选择以及准确相似性估计的最佳基因集通常未知，因此很难获得准确的相似性测量。此外，由于技术限制，单细胞测序包含更多的人工零值，技术噪声使得开发有效的单细胞聚类算法变得困难。在这里，我们描述了一种新颖的单细胞聚类算法，通过有效降低零膨胀噪声并准确估计细胞间的相似性，可以准确预测大规模单细胞测序中的单细胞聚类。首先，我们基于不同的相似性估计构建了一个集成相似性网络，并使用随机游走重启动框架来减少人工噪声。最后，从较大数量的小尺寸但高度一致的聚类开始，我们迭代地合并一对具有最大相似度的聚类，直到达到预测的聚类数量。广泛的性能评估表明，所提出的单细胞聚类算法可以产生准确的单细胞聚类结果，有助于解析复杂生物学机制背后的关键信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8910/8623803/9d1a888d6988/genes-12-01670-g001.jpg

相似文献

Accurate Single-Cell Clustering through Ensemble Similarity Learning.通过集成相似性学习实现准确的单细胞聚类。

Genes (Basel). 2021 Oct 22;12(11):1670. doi: 10.3390/genes12111670.

Effective single-cell clustering through ensemble feature selection and similarity measurements.通过集成特征选择和相似性测量实现有效的单细胞聚类。

Comput Biol Chem. 2020 May 19;87:107283. doi: 10.1016/j.compbiolchem.2020.107283.

Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.基于自动编码器的单细胞 RNA-seq 数据分析聚类集成。

BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.

Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data.评估 dropout 插补和聚类方法在单细胞 RNA 测序数据中的性能。

Comput Biol Med. 2022 Jul;146:105697. doi: 10.1016/j.compbiomed.2022.105697. Epub 2022 Jun 8.

SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble.SAME 聚类：基于混合模型集成的单细胞聚集聚类。

Nucleic Acids Res. 2020 Jan 10;48(1):86-95. doi: 10.1093/nar/gkz959.

A Gene Rank Based Approach for Single Cell Similarity Assessment and Clustering.基于基因排序的单细胞相似性评估和聚类方法。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):431-442. doi: 10.1109/TCBB.2019.2931582. Epub 2021 Apr 6.

GRACE: Graph autoencoder based single-cell clustering through ensemble similarity learning.GRACE：基于图自动编码器的通过集成相似性学习的单细胞聚类。

PLoS One. 2023 Apr 14;18(4):e0284527. doi: 10.1371/journal.pone.0284527. eCollection 2023.

Clustering ensemble in scRNA-seq data analysis: Methods, applications and challenges.单细胞 RNA 测序数据分析中的聚类集成：方法、应用和挑战。

Comput Biol Med. 2023 Jun;159:106939. doi: 10.1016/j.compbiomed.2023.106939. Epub 2023 Apr 15.

Machine learning and statistical methods for clustering single-cell RNA-sequencing data.机器学习和统计方法在单细胞 RNA 测序数据分析中的应用。

Brief Bioinform. 2020 Jul 15;21(4):1209-1223. doi: 10.1093/bib/bbz063.

EnTSSR: A Weighted Ensemble Learning Method to Impute Single-Cell RNA Sequencing Data.EnTSSR：一种用于单细胞 RNA 测序数据插补的加权集成学习方法。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2781-2787. doi: 10.1109/TCBB.2021.3110850. Epub 2021 Dec 8.

本文引用的文献

PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data.PseudotimeDE：从单细胞 RNA 测序数据中推断具有良好校准 p 值的细胞伪时间上的差异基因表达。

Genome Biol. 2021 Apr 29;22(1):124. doi: 10.1186/s13059-021-02341-y.

Clustering Single-Cell RNA-Seq Data with Regularized Gaussian Graphical Model.基于正则化高斯图模型的单细胞 RNA-Seq 数据聚类。

Genes (Basel). 2021 Feb 22;12(2):311. doi: 10.3390/genes12020311.

Effective single-cell clustering through ensemble feature selection and similarity measurements.通过集成特征选择和相似性测量实现有效的单细胞聚类。

Comput Biol Chem. 2020 May 19;87:107283. doi: 10.1016/j.compbiolchem.2020.107283.

PRIME: a probabilistic imputation method to reduce dropout effects in single-cell RNA sequencing.PRIME：一种用于减少单细胞 RNA 测序中数据丢失影响的概率插补方法。

Bioinformatics. 2020 Jul 1;36(13):4021-4029. doi: 10.1093/bioinformatics/btaa278.

Trajectory-based differential expression analysis for single-cell sequencing data.基于轨迹的单细胞测序数据分析。

Nat Commun. 2020 Mar 5;11(1):1201. doi: 10.1038/s41467-020-14766-3.

DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data.DeepImpute：一种准确、快速且可扩展的深度学习神经网络方法，用于填补单细胞 RNA-seq 数据。

Genome Biol. 2019 Oct 18;20(1):211. doi: 10.1186/s13059-019-1837-6.

Current best practices in single-cell RNA-seq analysis: a tutorial.单细胞 RNA 测序分析的当前最佳实践：教程。

Mol Syst Biol. 2019 Jun 19;15(6):e8746. doi: 10.15252/msb.20188746.

SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation.SinNLRR：一种基于非负低秩表示的稳健子空间聚类方法，用于细胞类型检测。

Bioinformatics. 2019 Oct 1;35(19):3642-3650. doi: 10.1093/bioinformatics/btz139.

GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks.GRNBoost2 和 Arboreto：高效且可扩展的基因调控网络推断。

Bioinformatics. 2019 Jun 1;35(12):2159-2161. doi: 10.1093/bioinformatics/bty916.

Recovering Gene Interactions from Single-Cell Data Using Data Diffusion.利用数据扩散从单细胞数据中恢复基因相互作用。

Cell. 2018 Jul 26;174(3):716-729.e27. doi: 10.1016/j.cell.2018.05.061. Epub 2018 Jun 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过集成相似性学习实现准确的单细胞聚类。

Accurate Single-Cell Clustering through Ensemble Similarity Learning.

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献