Suppr超能文献

GSTRPCA:用于单细胞多组学数据聚类的不规则张量奇异值分解

GSTRPCA: irregular tensor singular value decomposition for single-cell multi-omics data clustering.

作者信息

Cui Lubin, Guo Guiliang, Ng Michael K, Zou Quan, Qiu Yushan

机构信息

School of Mathematics and Statistics, Henan Normal University, Xinxiang 453007, China.

Department of Mathematics, Hong Kong Baptist University, Hong Kong 999077, China.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae649.

Abstract

Single-cell multi-omics refers to the various types of biological data at the single-cell level. These data have enabled insight and resolution to cellular phenotypes, biological processes, and developmental stages. Current advances hold high potential for breakthroughs by integrating multiple different omics layers. However, singlecell multi-omics data usually have different feature dimensions and direct or indirect relationships. How to keep the data structure of these different data and extract hidden relationships is a major challenge for omics data integration, and effective integration models are urgently needed. In this paper, we propose an irregular tensor decomposition model (GSTRPCA) based on tensor robust principal component analysis (TRPCA). We developed a weighted threshold model for the decomposition of irregular tensor data by combining low-rank and sparsity constraints, which requires that the low-dimensional embeddings of the data remain lowrank and sparse. The major advantage of the GSTRPCA algorithm is its ability to keep the original data structure and explore hidden related features among omics data. For GSTRPCA, we also designed an effective algorithm that theoretically guarantees global convergence for the tensor decomposition. The computational experiments on irregular tensor datasets demonstrate that GSTRPCA significantly outperformed the state-of-the-art methods and hence confirm the superiority of GSTRPCA in clustering single-cell multiomics data. To our knowledge, this is the first tensor decomposition method for irregular tensor data to keep the data structure and hence improve the clustering performance for single-cell multi-omics data. GSTRPCA is a Matlabbased algorithm, and the code is available from https://github.com/GGL-B/GSTRPCA.

摘要

单细胞多组学是指单细胞水平上的各种生物数据类型。这些数据能够深入了解细胞表型、生物过程和发育阶段。当前的进展通过整合多个不同的组学层面,具有实现突破的巨大潜力。然而,单细胞多组学数据通常具有不同的特征维度以及直接或间接的关系。如何保持这些不同数据的数据结构并提取隐藏的关系,是组学数据整合面临的一项重大挑战,因此迫切需要有效的整合模型。在本文中,我们提出了一种基于张量稳健主成分分析(TRPCA)的不规则张量分解模型(GSTRPCA)。我们通过结合低秩和稀疏约束,开发了一种用于分解不规则张量数据的加权阈值模型,该模型要求数据的低维嵌入保持低秩和稀疏。GSTRPCA算法的主要优点在于其能够保持原始数据结构并探索组学数据之间隐藏的相关特征。对于GSTRPCA,我们还设计了一种有效的算法,从理论上保证张量分解的全局收敛性。在不规则张量数据集上的计算实验表明,GSTRPCA显著优于现有方法,从而证实了GSTRPCA在聚类单细胞多组学数据方面的优越性。据我们所知,这是第一种用于不规则张量数据以保持数据结构从而提高单细胞多组学数据聚类性能的张量分解方法。GSTRPCA是一种基于Matlab的算法,代码可从https://github.com/GGL-B/GSTRPCA获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3a2/11647523/4dc20536de3c/bbae649f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验