Suppr超能文献

基于 FDR 控制的包含信息的基因网络构建

Information-incorporated gene network construction with FDR control.

机构信息

Department of Statistics, Iowa State University, Ames, IA 50010, United States.

Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50010, United States.

出版信息

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae125.

Abstract

MOTIVATION

Large-scale gene expression studies allow gene network construction to uncover associations among genes. To study direct associations among genes, partial correlation-based networks are preferred over marginal correlations. However, FDR control for partial correlation-based network construction is not well-studied. In addition, currently available partial correlation-based methods cannot take existing biological knowledge to help network construction while controlling FDR.

RESULTS

In this paper, we propose a method called Partial Correlation Graph with Information Incorporation (PCGII). PCGII estimates partial correlations between each pair of genes by regularized node-wise regression that can incorporate prior knowledge while controlling the effects of all other genes. It handles high-dimensional data where the number of genes can be much larger than the sample size and controls FDR at the same time. We compare PCGII with several existing approaches through extensive simulation studies and demonstrate that PCGII has better FDR control and higher power. We apply PCGII to a plant gene expression dataset where it recovers confirmed regulatory relationships and a hub node, as well as several direct associations that shed light on potential functional relationships in the system. We also introduce a method to supplement observed data with a pseudogene to apply PCGII when no prior information is available, which also allows checking FDR control and power for real data analysis.

AVAILABILITY AND IMPLEMENTATION

R package is freely available for download at https://cran.r-project.org/package=PCGII.

摘要

动机

大规模基因表达研究允许构建基因网络,以揭示基因之间的关联。为了研究基因之间的直接关联,优选基于部分相关性的网络而不是边际相关性。然而,基于部分相关性的网络构建的 FDR 控制尚未得到很好的研究。此外,目前可用的基于部分相关性的方法在控制 FDR 的同时,无法利用现有的生物学知识来帮助网络构建。

结果

在本文中,我们提出了一种称为带有信息整合的部分相关图(PCGII)的方法。PCGII 通过正则化节点的回归来估计每对基因之间的部分相关性,该回归可以在控制所有其他基因影响的同时整合先验知识。它处理高维数据,其中基因的数量可以远大于样本量,并同时控制 FDR。我们通过广泛的模拟研究将 PCGII 与几种现有方法进行了比较,结果表明 PCGII 具有更好的 FDR 控制和更高的功效。我们将 PCGII 应用于植物基因表达数据集,其中它恢复了确认的调节关系和一个枢纽节点,以及几个直接关联,这些关联揭示了系统中潜在的功能关系。我们还介绍了一种方法,通过添加假基因来补充观测数据,以便在没有先验信息时应用 PCGII,这也允许检查真实数据分析的 FDR 控制和功效。

可用性和实现

R 包可在 https://cran.r-project.org/package=PCGII 上免费下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1fa/10937901/9c951b9a0b65/btae125f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验