Suppr超能文献

SILGGM:一个用于大规模基因网络中高效统计推断的扩展 R 包。

SILGGM: An extensive R package for efficient statistical inference in large-scale gene networks.

机构信息

Department of Statistics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.

Division of Pulmonary Medicine; Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.

出版信息

PLoS Comput Biol. 2018 Aug 13;14(8):e1006369. doi: 10.1371/journal.pcbi.1006369. eCollection 2018 Aug.

Abstract

Gene co-expression network analysis is extremely useful in interpreting a complex biological process. The recent droplet-based single-cell technology is able to generate much larger gene expression data routinely with thousands of samples and tens of thousands of genes. To analyze such a large-scale gene-gene network, remarkable progress has been made in rigorous statistical inference of high-dimensional Gaussian graphical model (GGM). These approaches provide a formal confidence interval or a p-value rather than only a single point estimator for conditional dependence of a gene pair and are more desirable for identifying reliable gene networks. To promote their widespread use, we herein introduce an extensive and efficient R package named SILGGM (Statistical Inference of Large-scale Gaussian Graphical Model) that includes four main approaches in statistical inference of high-dimensional GGM. Unlike the existing tools, SILGGM provides statistically efficient inference on both individual gene pair and whole-scale gene pairs. It has a novel and consistent false discovery rate (FDR) procedure in all four methodologies. Based on the user-friendly design, it provides outputs compatible with multiple platforms for interactive network visualization. Furthermore, comparisons in simulation illustrate that SILGGM can accelerate the existing MATLAB implementation to several orders of magnitudes and further improve the speed of the already very efficient R package FastGGM. Testing results from the simulated data confirm the validity of all the approaches in SILGGM even in a very large-scale setting with the number of variables or genes to a ten thousand level. We have also applied our package to a novel single-cell RNA-seq data set with pan T cells. The results show that the approaches in SILGGM significantly outperform the conventional ones in a biological sense. The package is freely available via CRAN at https://cran.r-project.org/package=SILGGM.

摘要

基因共表达网络分析在解释复杂的生物过程中非常有用。最近的基于液滴的单细胞技术能够常规地生成更大规模的基因表达数据,通常可获得数千个样本和数万基因。为了分析如此大规模的基因-基因网络,在严格的高维高斯图形模型(GGM)统计推断方面已经取得了显著进展。这些方法为基因对的条件依赖性提供了正式的置信区间或 p 值,而不仅仅是单个点估计值,因此更适合识别可靠的基因网络。为了促进它们的广泛应用,我们在此引入了一个广泛而高效的 R 包,名为 SILGGM(大规模高斯图形模型的统计推断),它包含了高维 GGM 统计推断的四个主要方法。与现有工具不同,SILGGM 为单个基因对和全尺度基因对提供了统计上有效的推断。它在所有四种方法中都具有新颖且一致的错误发现率(FDR)程序。基于用户友好的设计,它提供了与多个平台兼容的输出,用于交互式网络可视化。此外,在模拟中的比较表明,SILGGM 可以将现有的 MATLAB 实现加速到几个数量级,并进一步提高已经非常高效的 R 包 FastGGM 的速度。模拟数据的测试结果证实了 SILGGM 中所有方法的有效性,即使在变量或基因数量达到一万级的非常大规模设置中也是如此。我们还将我们的包应用于具有 pan T 细胞的新型单细胞 RNA-seq 数据集。结果表明,SILGGM 中的方法在生物学意义上明显优于传统方法。该软件包可通过 CRAN 在 https://cran.r-project.org/package=SILGGM 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa96/6107288/5f3ad5b37647/pcbi.1006369.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验