Jiao Cui-Na, Liu Jin-Xing, Wang Juan, Shang Junliang, Zheng Chun-Hou
IEEE J Biomed Health Inform. 2022 Apr;26(4):1872-1882. doi: 10.1109/JBHI.2021.3110766. Epub 2022 Apr 14.
The exploration of single cell RNA-sequencing (scRNA-seq) technology generates a new perspective to analyze biological problems. One of the major applications of scRNA-seq data is to discover subtypes of cells by cell clustering. Nevertheless, it is challengeable for traditional methods to handle scRNA-seq data with high level of technical noise and notorious dropouts. To better analyze single cell data, a novel scRNA-seq data analysis model called Maximum correntropy criterion based Non-negative and Low Rank Representation (MccNLRR) is introduced. Specifically, the maximum correntropy criterion, as an effective loss function, is more robust to the high noise and large outliers existed in the data. Moreover, the low rank representation is proven to be a powerful tool for capturing the global and local structures of data. Therefore, some important information, such as the similarity of cells in the subspace, is also extracted by it. Then, an iterative algorithm on the basis of the half-quadratic optimization and alternating direction method is developed to settle the complex optimization problem. Before the experiment, we also analyze the convergence and robustness of MccNLRR. At last, the results of cell clustering, visualization analysis, and gene markers selection on scRNA-seq data reveal that MccNLRR method can distinguish cell subtypes accurately and robustly.
单细胞RNA测序(scRNA-seq)技术的探索为分析生物学问题提供了新的视角。scRNA-seq数据的主要应用之一是通过细胞聚类来发现细胞亚型。然而,传统方法处理具有高水平技术噪声和严重数据丢失的scRNA-seq数据具有挑战性。为了更好地分析单细胞数据,引入了一种名为基于最大相关熵准则的非负低秩表示(MccNLRR)的新型scRNA-seq数据分析模型。具体而言,最大相关熵准则作为一种有效的损失函数,对数据中存在的高噪声和大异常值具有更强的鲁棒性。此外,低秩表示被证明是捕获数据全局和局部结构的有力工具。因此,它还能提取一些重要信息,如子空间中细胞的相似性。然后,基于半二次优化和交替方向法开发了一种迭代算法来解决复杂的优化问题。在实验之前,我们还分析了MccNLRR的收敛性和鲁棒性。最后,在scRNA-seq数据上进行细胞聚类、可视化分析和基因标记选择的结果表明,MccNLRR方法能够准确且稳健地区分细胞亚型。