Liu Tianyi, Liu Chuwen, Li Quefeng, Zheng Xiaojing, Zou Fei
Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
Department of Pediatrics, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
NAR Genom Bioinform. 2025 Apr 26;7(2):lqaf046. doi: 10.1093/nargab/lqaf046. eCollection 2025 Jun.
Accurate deconvolution of cell types from bulk gene expression is crucial for understanding cellular compositions and uncovering cell-type specific differential expression and physiological states of diseased tissues. Existing deconvolution methods have limitations, such as requiring complete cellular gene expression signatures or neglecting partial biological information. Moreover, these methods often overlook varying cell-type messenger RNA amounts, leading to biased proportion estimates. Additionally, they do not effectively utilize valuable reference information from external studies, such as means and ranges of population cell-type proportions. To address these challenges, we introduce an adaptive regularized tri-factor non-negative matrix factorization approach for deconvolution (ARTdeConv). We rigorously establish the numerical convergence of our algorithm. Through benchmark simulations, we demonstrate the superior performance of ARTdeConv compared to state-of-the-art semi-reference-based and reference-free methods as well as its robustness under challenges to its assumptions. In a real-world application to a dataset from a trivalent influenza vaccine study, our method accurately estimates cellular proportions, as evidenced by the nearly perfect Pearson's correlation between ARTdeConv estimates and flow cytometry measurements. Moreover, our analysis of ARTdeConv estimates in COVID-19 patients reveals patterns consistent with important immunological phenomena observed in other studies. The proposed method, ARTdeConv, is implemented as an R package and can be accessed on GitHub for researchers and practitioners.
从大量基因表达中准确反卷积细胞类型对于理解细胞组成以及揭示疾病组织中细胞类型特异性差异表达和生理状态至关重要。现有的反卷积方法存在局限性,例如需要完整的细胞基因表达特征或忽略部分生物学信息。此外,这些方法常常忽略不同细胞类型的信使核糖核酸量,导致比例估计有偏差。另外,它们没有有效利用来自外部研究的有价值参考信息,例如群体细胞类型比例的均值和范围。为应对这些挑战,我们引入了一种用于反卷积的自适应正则化三因子非负矩阵分解方法(ARTdeConv)。我们严格证明了我们算法的数值收敛性。通过基准模拟,我们展示了ARTdeConv相对于基于半参考和无参考的最先进方法的优越性能及其在假设受到挑战时的稳健性。在一项来自三价流感疫苗研究数据集的实际应用中,我们的方法准确估计了细胞比例,ARTdeConv估计值与流式细胞术测量值之间近乎完美的皮尔逊相关性证明了这一点。此外,我们对COVID-19患者中ARTdeConv估计值的分析揭示了与其他研究中观察到的重要免疫现象一致的模式。所提出的方法ARTdeConv作为一个R包实现,研究人员和从业者可以在GitHub上获取。