School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac144.
The ubiquitous dropout problem in single-cell RNA sequencing technology causes a large amount of data noise in the gene expression profile. For this reason, we propose an evolutionary sparse imputation (ESI) algorithm for single-cell transcriptomes, which constructs a sparse representation model based on gene regulation relationships between cells. To solve this model, we design an optimization framework based on nondominated sorting genetics. This framework takes into account the topological relationship between cells and the variety of gene expression to iteratively search the global optimal solution, thereby learning the Pareto optimal cell-cell affinity matrix. Finally, we use the learned sparse relationship model between cells to improve data quality and reduce data noise. In simulated datasets, scESI performed significantly better than benchmark methods with various metrics. By applying scESI to real scRNA-seq datasets, we discovered scESI can not only further classify the cell types and separate cells in visualization successfully but also improve the performance in reconstructing trajectories differentiation and identifying differentially expressed genes. In addition, scESI successfully recovered the expression trends of marker genes in stem cell differentiation and can discover new cell types and putative pathways regulating biological processes.
单细胞 RNA 测序技术中普遍存在的辍学问题会导致基因表达谱中产生大量数据噪声。有鉴于此,我们针对单细胞转录组提出了一种进化稀疏插补(ESI)算法,该算法基于细胞间的基因调控关系构建稀疏表示模型。为了解决这个模型,我们设计了一个基于非支配排序遗传算法的优化框架。这个框架考虑到了细胞之间的拓扑关系和基因表达的多样性,通过迭代搜索全局最优解,从而学习到帕累托最优的细胞间亲和矩阵。最后,我们利用学到的细胞间稀疏关系模型来提高数据质量并减少数据噪声。在模拟数据集上,scESI 在各种指标上的表现明显优于基准方法。通过将 scESI 应用于真实的 scRNA-seq 数据集,我们发现它不仅可以成功地进一步对细胞类型进行分类和可视化分离,而且还可以提高轨迹分化的重构性能和识别差异表达基因的性能。此外,scESI 成功地恢复了干细胞分化中标记基因的表达趋势,并能够发现新的细胞类型和潜在的调节生物过程的途径。