Computer Science Research Centre, University of Surrey, Guildford, UK.
BMC Bioinformatics. 2024 Oct 26;25(1):339. doi: 10.1186/s12859-024-05946-9.
Gene interaction networks are graphs in which nodes represent genes and edges represent functional interactions between them. These interactions can be at multiple levels, for instance, gene regulation, protein-protein interaction, or metabolic pathways. To analyse gene interaction networks at a large scale, gene co-expression network analysis is often applied on high-throughput gene expression data such as RNA sequencing data. With the advance in sequencing technology, expression of genes can be measured in individual cells. Single-cell RNA sequencing (scRNAseq) provides insights of cellular development, differentiation and characteristics at the transcriptomic level. High sparsity and high-dimensional data structures pose challenges in scRNAseq data analysis.
In this study, a sparse inverse covariance matrix estimation framework for scRNAseq data is developed to capture direct functional interactions between genes. Comparative analyses highlight high performance and fast computation of Stein-type shrinkage in high-dimensional data using simulated scRNAseq data. Data transformation approaches also show improvement in performance of shrinkage methods in non-Gaussian distributed data. Zero-inflated modelling of scRNAseq data based on a negative binomial distribution enhances shrinkage performance in zero-inflated data without interference on non zero-inflated count data.
The proposed framework broadens application of graphical model in scRNAseq analysis with flexibility in sparsity of count data resulting from dropout events, high performance, and fast computational time. Implementation of the framework is in a reproducible Snakemake workflow https://github.com/calathea24/ZINBGraphicalModel and R package ZINBStein https://github.com/calathea24/ZINBStein .
基因交互网络是一种图,其中节点表示基因,边表示它们之间的功能交互。这些交互可以在多个层次上进行,例如基因调控、蛋白质-蛋白质相互作用或代谢途径。为了在大规模上分析基因交互网络,通常在高通量基因表达数据(例如 RNA 测序数据)上应用基因共表达网络分析。随着测序技术的进步,可以在单个细胞中测量基因的表达。单细胞 RNA 测序 (scRNAseq) 提供了细胞发育、分化和转录组水平特征的见解。高稀疏性和高维数据结构给 scRNAseq 数据分析带来了挑战。
本研究开发了一种用于 scRNAseq 数据的稀疏逆协方差矩阵估计框架,以捕获基因之间的直接功能交互。比较分析突出了使用模拟 scRNAseq 数据在高维数据中使用 Stein 型收缩的高性能和快速计算。数据转换方法还表明,在非高斯分布数据中,收缩方法的性能在数据转换后得到了提高。基于负二项分布的 scRNAseq 数据的零膨胀建模增强了零膨胀数据中收缩性能,而不会干扰非零膨胀计数数据。
该框架拓宽了图形模型在 scRNAseq 分析中的应用,具有灵活性,可以处理由于辍学事件导致的计数数据的稀疏性,具有高性能和快速计算时间。该框架的实现是在可重复的 Snakemake 工作流程 https://github.com/calathea24/ZINBGraphicalModel 和 R 包 ZINBStein https://github.com/calathea24/ZINBStein 中进行的。