IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1773-1784. doi: 10.1109/TCBB.2019.2906601. Epub 2019 Mar 25.
In recent years, single-cell RNA sequencing reveals diverse cell genetics at unprecedented resolutions. Such technological advances enable researchers to uncover the functionally distinct cell subtypes such as hematopoietic stem cell subpopulation identification. However, most of the related algorithms have been hindered by the high-dimensionality and sparse nature of single-cell RNA sequencing (RNA-seq) data. To address those problems, we propose a multiobjective evolutionary clustering based on adaptive non-negative matrix factorization (MCANMF) for multiobjective single-cell RNA-seq data clustering. First, adaptive non-negative matrix factorization is proposed to decompose data for feature extraction. After that, a multiobjective clustering algorithm based on learning vector quantization is proposed to analyze single-cell RNA-seq data. To validate the effectiveness of MCANMF, we benchmark MCANMF against 15 state-of-the-art methods including seven feature extraction methods, seven clustering methods, and the kernel-based similarity learning method on six published single-cell RNA sequencing datasets comprehensively. When compared with those 15 state-of-the-art methods, MCANMF performs better than the others on those single-cell RNA sequencing datasets according to multiple evaluation metrics. Moreover, the MCANMF component analysis, time complexity analysis, and parameter analysis are conducted to demonstrate various properties of our proposed algorithm.
近年来,单细胞 RNA 测序技术以空前的分辨率揭示了多样化的细胞遗传学。这些技术进步使研究人员能够发现功能不同的细胞亚型,如造血干细胞亚群的鉴定。然而,大多数相关算法都受到单细胞 RNA 测序(RNA-seq)数据的高维性和稀疏性的限制。为了解决这些问题,我们提出了一种基于自适应非负矩阵分解(MCANMF)的多目标进化聚类算法,用于多目标单细胞 RNA-seq 数据聚类。首先,提出了自适应非负矩阵分解来对数据进行特征提取。之后,提出了一种基于学习矢量量化的多目标聚类算法来分析单细胞 RNA-seq 数据。为了验证 MCANMF 的有效性,我们在六个已发表的单细胞 RNA 测序数据集上,将 MCANMF 与 15 种最先进的方法进行了基准测试,包括七种特征提取方法、七种聚类方法和基于核的相似性学习方法。与其他 15 种最先进的方法相比,根据多种评估指标,MCANMF 在这些单细胞 RNA 测序数据集上的性能优于其他方法。此外,还进行了 MCANMF 的组件分析、时间复杂度分析和参数分析,以展示我们提出的算法的各种特性。