Song Qianqian, Zhu Xuewei, Jin Lingtao, Chen Minghan, Zhang Wei, Su Jing
Center for Cancer Genomics and Precision Oncology, Wake Forest Baptist Comprehensive Cancer Center, Atrium Health Wake Forest Baptist, Winston-Salem, NC27157, USA.
Department of Internal Medicine, Section on Molecular Medicine, Wake Forest School of Medicine, Winston-Salem, NC27101, USA.
NAR Genom Bioinform. 2022 Jul 27;4(3):lqac056. doi: 10.1093/nargab/lqac056. eCollection 2022 Sep.
Unravelling the regulatory programs from single-cell multi-omics data has long been one of the major challenges in genomics, especially in the current emerging single-cell field. Currently there is a huge gap between fast-growing single-cell multi-omics data and effective methods for the integrative analysis of these inherent sparse and heterogeneous data. In this study, we have developed a novel method, Single-cell Multi-omics Gene co-Regulatory algorithm (SMGR), to detect coherent functional regulatory signals and target genes from the joint single-cell RNA-sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) data obtained from different samples. Given that scRNA-seq and scATAC-seq data can be captured by zero-inflated Negative Binomial distribution, we utilize a generalized linear regression model to identify the latent representation of consistently expressed genes and peaks, thus enables the identification of co-regulatory programs and the elucidation of regulating mechanisms. Results from both simulation and experimental data demonstrate that SMGR outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of SMGR, we apply SMGR to mixed-phenotype acute leukemia (MPAL) and identify the MPAL-specific regulatory program with significant peak-gene links, which greatly enhance our understanding of the regulatory mechanisms and potential targets of this complex tumor.
长期以来,从单细胞多组学数据中解析调控程序一直是基因组学领域的主要挑战之一,尤其是在当前新兴的单细胞领域。目前,快速增长的单细胞多组学数据与用于综合分析这些固有稀疏且异质数据的有效方法之间存在巨大差距。在本研究中,我们开发了一种新方法,即单细胞多组学基因共调控算法(SMGR),用于从不同样本获取的联合单细胞RNA测序(scRNA-seq)和单细胞转座酶可及染色质测序分析(scATAC-seq)数据中检测连贯的功能调控信号和靶基因。鉴于scRNA-seq和scATAC-seq数据可通过零膨胀负二项分布进行捕获,我们利用广义线性回归模型来识别持续表达基因和峰的潜在表示,从而能够识别共调控程序并阐明调控机制。模拟数据和实验数据的结果均表明,SMGR在准确性上有显著提高,优于现有方法。为了阐明SMGR的生物学见解,我们将SMGR应用于混合表型急性白血病(MPAL),并识别出具有显著峰-基因联系的MPAL特异性调控程序,这极大地增进了我们对这种复杂肿瘤调控机制和潜在靶点的理解。