Wang Tianyu, Nabavi Sheida
Computer Science and Engineering, University of Connecticut, Storrs, USA.
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2017 Nov;2017:202-207. doi: 10.1109/bibm.2017.8217650. Epub 2017 Dec 18.
Differential gene expression analysis is one of the significant efforts in single cell RNA sequencing (scRNAseq) analysis to discover the specific changes in expression levels of individual cell types. Since scRNAseq exhibits multimodality, large amounts of zero counts, and sparsity, it is different from the traditional bulk RNA sequencing (RNAseq) data. The new challenges of scRNAseq data promote the development of new methods for identifying differentially expressed (DE) genes. In this study, we proposed a new method, SigEMD, that combines a logistic regression model and a nonparametric method based on Earth Mover's Distance, to precisely and efficiently identify DE genes in scRNAseq data. The regression model is used to reduce the impact of large amounts of zero counts, and the nonparametric method is used to improve the sensitivity of detecting DE genes from multimodal scRNAseq data. By additionally employing gene interaction network information to adjust the final states of DE genes, we further reduce the false positives of calling DE genes. We used simulated data and real data to evaluate the detection accuracy of the proposed method and to compare its performance with those of other differential expression analysis methods. Results indicate that the proposed method has an overall powerful performance in terms of precision in detection, sensitivity, and specificity.
差异基因表达分析是单细胞RNA测序(scRNAseq)分析中的一项重要工作,旨在发现各个细胞类型表达水平的特定变化。由于scRNAseq具有多模态、大量零计数和稀疏性等特点,它与传统的批量RNA测序(RNAseq)数据不同。scRNAseq数据带来的新挑战推动了用于识别差异表达(DE)基因的新方法的发展。在本研究中,我们提出了一种新方法SigEMD,它结合了逻辑回归模型和基于推土机距离的非参数方法,以精确高效地识别scRNAseq数据中的DE基因。回归模型用于减少大量零计数的影响,非参数方法用于提高从多模态scRNAseq数据中检测DE基因的灵敏度。通过额外利用基因相互作用网络信息来调整DE基因的最终状态,我们进一步降低了调用DE基因的假阳性率。我们使用模拟数据和真实数据来评估所提出方法的检测准确性,并将其性能与其他差异表达分析方法进行比较。结果表明,所提出的方法在检测精度、灵敏度和特异性方面总体表现强大。