Mu Yudi, Li Wei Vivian
Department of Statistics, University of California, Riverside, Riverside, CA 92521, United States.
Bioinform Adv. 2025 Aug 9;5(1):vbaf189. doi: 10.1093/bioadv/vbaf189. eCollection 2025.
The growing availability of single-cell RNA sequencing (scRNA-seq) data highlights the necessity for robust integration methods to uncover both shared and unique cellular features across samples. These datasets often exhibit technical variations and biological differences, complicating integrative analyses. While numerous integration methods have been proposed, many fail to account for individual-level covariates or are limited to discrete variables.
To address these limitations, we propose scINSIGHT2, a generalized linear latent variable model that accommodates both continuous covariates, such as age, and discrete factors, such as disease conditions. Through both simulation studies and real-data applications, we demonstrate that scINSIGHT2 accurately harmonizes scRNA-seq datasets, whether from single or multiple sources. These results highlight scINSIGHT2's utility in capturing meaningful biological insights from scRNA-seq data while accounting for individual-level variation.
The scINSIGHT2 method has been implemented as a R package, which is available at https://github.com/yudimu/scINSIGHT2/.
单细胞RNA测序(scRNA-seq)数据的可得性不断增加,凸显了强大的整合方法对于揭示不同样本间共享和独特细胞特征的必要性。这些数据集常常表现出技术差异和生物学差异,使得整合分析变得复杂。虽然已经提出了许多整合方法,但许多方法未能考虑个体水平的协变量,或者仅限于离散变量。
为解决这些局限性,我们提出了scINSIGHT2,这是一种广义线性潜在变量模型,它既能处理连续协变量(如年龄),也能处理离散因素(如疾病状况)。通过模拟研究和实际数据应用,我们证明scINSIGHT2能够准确地整合scRNA-seq数据集,无论其来自单一还是多个来源。这些结果凸显了scINSIGHT2在考虑个体水平变异的同时,从scRNA-seq数据中获取有意义生物学见解的效用。
scINSIGHT2方法已作为一个R包实现,可在https://github.com/yudimu/scINSIGHT2/获取。