Suppr超能文献

使用改进的基于网络的多图正则化集成聚类实现单细胞多组学数据的有效整合。

Effective Integration of Single-Cell Multi-Omics Data Using Improved Network-Based Integrative Clustering with Multigraph Regularization.

作者信息

Zhang Shunqin, Kong Wei, Wang Shuaiqun, Wei Kai, Liu Kun, Wen Gen, Yu Yaling

机构信息

College of Information Engineering, Shanghai Maritime University, Shanghai, P.R. China.

Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, P.R. China.

出版信息

J Comput Biol. 2025 Jun;32(6):601-614. doi: 10.1089/cmb.2023.0460. Epub 2025 May 22.

Abstract

The purpose of integrating different omics data is to study cellular heterogeneity at the level of transcriptional regulation from different gene levels, which can effectively identify cell types and reveal the pathogenesis of Alzheimer's disease (AD) from two perspectives. However, implementing such algorithms faces challenges such as high data noise levels, increased dimensionality, and computational complexity. In this study, multigraph regularization constraints were introduced in the network-based integrative clustering algorithm (MGR-NIC) to remove redundant features and keep the geometry structures underlying the data by fusing two types of data (snRNA-seq and snATAC-seq) of glial cells from AD samples. The effectiveness of the MGR-NIC algorithm was validated using both simulation datasets and real datasets derived from various tissues. The MGR-NIC algorithm can improve clustering accuracy by selecting features that better represent the dataset's structure. The clustering results obtained with the MGR-NIC algorithm show strong consistency with the clustering results inherent to the published DLPFC dataset, while the classification results generated using the NIC algorithm often lead to cluster overlap when applied to the DLPFC dataset. We will use the same state-of-the-art algorithms for a comprehensive evaluation with our proposed MGR-NIC algorithm, including NIC, scAI, Multi-Omics Factor Analysis v2, and JSNMF. MGR-NIC is the most stable and reliable method, implying its robustness across different datasets and its reliability in yielding consistent and accurate results.

摘要

整合不同的组学数据的目的是在不同基因水平的转录调控层面研究细胞异质性,这可以从两个角度有效识别细胞类型并揭示阿尔茨海默病(AD)的发病机制。然而,实施此类算法面临诸如高数据噪声水平、维度增加和计算复杂性等挑战。在本研究中,在基于网络的整合聚类算法(MGR-NIC)中引入了多重图正则化约束,以通过融合来自AD样本的两种神经胶质细胞数据(snRNA-seq和snATAC-seq)来去除冗余特征并保留数据背后的几何结构。使用模拟数据集和来自各种组织的真实数据集验证了MGR-NIC算法的有效性。MGR-NIC算法可以通过选择能更好地代表数据集结构的特征来提高聚类准确性。用MGR-NIC算法获得的聚类结果与已发表的DLPFC数据集固有的聚类结果显示出很强的一致性,而当应用于DLPFC数据集时,使用NIC算法生成的分类结果常常导致聚类重叠。我们将使用相同的最先进算法与我们提出的MGR-NIC算法进行全面评估,包括NIC、scAI、多组学因子分析v2和JSNMF。MGR-NIC是最稳定可靠的方法,这意味着它在不同数据集上具有稳健性,并且在产生一致准确的结果方面具有可靠性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验