• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SEMbap:无弓协方差搜索和数据去相关。

SEMbap: Bow-free covariance search and data de-correlation.

机构信息

Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.

出版信息

PLoS Comput Biol. 2024 Sep 11;20(9):e1012448. doi: 10.1371/journal.pcbi.1012448. eCollection 2024 Sep.

DOI:10.1371/journal.pcbi.1012448
PMID:39259748
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11419354/
Abstract

Large-scale studies of gene expression are commonly influenced by biological and technical sources of expression variation, including batch effects, sample characteristics, and environmental impacts. Learning the causal relationships between observable variables may be challenging in the presence of unobserved confounders. Furthermore, many high-dimensional regression techniques may perform worse. In fact, controlling for unobserved confounding variables is essential, and many deconfounding methods have been suggested for application in a variety of situations. The main contribution of this article is the development of a two-stage deconfounding procedure based on Bow-free Acyclic Paths (BAP) search developed into the framework of Structural Equation Models (SEM), called SEMbap(). In the first stage, an exhaustive search of missing edges with significant covariance is performed via Shipley d-separation tests; then, in the second stage, a Constrained Gaussian Graphical Model (CGGM) is fitted or a low dimensional representation of bow-free edges structure is obtained via Graph Laplacian Principal Component Analysis (gLPCA). We compare four popular deconfounding methods to BAP search approach with applications on simulated and observed expression data. In the former, different structures of the hidden covariance matrix have been replicated. Compared to existing methods, BAP search algorithm is able to correctly identify hidden confounding whilst controlling false positive rate and achieving good fitting and perturbation metrics.

摘要

大规模的基因表达研究通常受到表达变异的生物和技术来源的影响,包括批次效应、样本特征和环境影响。在存在未观察到的混杂因素的情况下,学习可观察变量之间的因果关系可能具有挑战性。此外,许多高维回归技术的性能可能会更差。事实上,控制未观察到的混杂变量是至关重要的,已经提出了许多去混杂方法来应用于各种情况。本文的主要贡献是开发了一种基于无向自由路径(BAP)搜索的两阶段去混杂程序,并将其纳入结构方程模型(SEM)的框架中,称为 SEMbap()。在第一阶段,通过 Shipley d-分离测试对具有显著协方差的缺失边进行全面搜索;然后,在第二阶段,通过约束高斯图模型(CGGM)拟合或通过图拉普拉斯主成分分析(gLPCA)获得无向边结构的低维表示。我们将 BAP 搜索方法与四种流行的去混杂方法进行比较,并将其应用于模拟和观察到的表达数据。在前者中,复制了隐藏协方差矩阵的不同结构。与现有方法相比,BAP 搜索算法能够正确识别隐藏的混杂因素,同时控制假阳性率并实现良好的拟合和扰动度量。

相似文献

1
SEMbap: Bow-free covariance search and data de-correlation.SEMbap:无弓协方差搜索和数据去相关。
PLoS Comput Biol. 2024 Sep 11;20(9):e1012448. doi: 10.1371/journal.pcbi.1012448. eCollection 2024 Sep.
2
Regularized estimation of large-scale gene association networks using graphical Gaussian models.基于图式高斯模型的大规模基因关联网络正则化估计
BMC Bioinformatics. 2009 Nov 24;10:384. doi: 10.1186/1471-2105-10-384.
3
Testing Differential Gene Networks under Nonparanormal Graphical Models with False Discovery Rate Control.基于错误发现率控制的非正态图模型下差异基因网络的检测。
Genes (Basel). 2020 Feb 5;11(2):167. doi: 10.3390/genes11020167.
4
Learning genetic and environmental graphical models from family data.从家族数据中学习遗传和环境图形模型。
Stat Med. 2020 Aug 15;39(18):2403-2422. doi: 10.1002/sim.8545. Epub 2020 Apr 28.
5
Robust Gaussian graphical modeling via l1 penalization.通过 l1 惩罚实现稳健的高斯图形模型。
Biometrics. 2012 Dec;68(4):1197-206. doi: 10.1111/j.1541-0420.2012.01785.x. Epub 2012 Sep 28.
6
A novel approach to the clustering of microarray data via nonparametric density estimation.一种基于非参数密度估计的微阵列数据聚类新方法。
BMC Bioinformatics. 2011 Feb 8;12:49. doi: 10.1186/1471-2105-12-49.
7
A Statistical Test for Differential Network Analysis Based on Inference of Gaussian Graphical Model.基于高斯图模型推断的差异网络分析的统计检验
Sci Rep. 2019 Jul 26;9(1):10863. doi: 10.1038/s41598-019-47362-7.
8
Differential correlation for sequencing data.测序数据的差异相关性
BMC Res Notes. 2017 Jan 19;10(1):54. doi: 10.1186/s13104-016-2331-9.
9
Biological network inference using low order partial correlation.使用低阶偏相关进行生物网络推断。
Methods. 2014 Oct 1;69(3):266-73. doi: 10.1016/j.ymeth.2014.06.010. Epub 2014 Jul 5.
10
Causal discoveries for high dimensional mixed data.高维混合数据的因果发现。
Stat Med. 2022 Oct 30;41(24):4924-4940. doi: 10.1002/sim.9544. Epub 2022 Aug 15.

本文引用的文献

1
Gaussian graphical models with applications to omics analyses.高斯图模型及其在组学分析中的应用。
Stat Med. 2022 Nov 10;41(25):5150-5187. doi: 10.1002/sim.9546. Epub 2022 Sep 26.
2
SEMgraph: an R package for causal network inference of high-throughput data with structural equation models.SEMgraph:一个用于使用结构方程模型对高通量数据进行因果网络推断的 R 包。
Bioinformatics. 2022 Oct 14;38(20):4829-4830. doi: 10.1093/bioinformatics/btac567.
3
DOUBLY DEBIASED LASSO: HIGH-DIMENSIONAL INFERENCE UNDER HIDDEN CONFOUNDING.双重去偏套索法:隐藏混杂因素下的高维推断
Ann Stat. 2022 Jun;50(3):1320-1347. doi: 10.1214/21-aos2152. Epub 2022 Jun 16.
4
Identifying cancer pathway dysregulations using differential causal effects.利用差异因果效应识别癌症通路失调
Bioinformatics. 2022 Mar 4;38(6):1550-1559. doi: 10.1093/bioinformatics/btab847.
5
A data-driven approach to measuring epidemiological susceptibility risk around the world.一种基于数据的方法,用于测量全球范围内的流行病学易感性风险。
Sci Rep. 2021 Dec 15;11(1):24037. doi: 10.1038/s41598-021-03322-8.
6
Multiomic Integration of Public Oncology Databases in Bioconductor.公共肿瘤学数据库的 Bioconductor 多组学整合。
JCO Clin Cancer Inform. 2020 Oct;4:958-971. doi: 10.1200/CCI.19.00119.
7
Empirical Bayes shrinkage and false discovery rate estimation, allowing for unwanted variation.经验贝叶斯收缩和错误发现率估计,允许出现不需要的变化。
Biostatistics. 2020 Jan 1;21(1):15-32. doi: 10.1093/biostatistics/kxy029.
8
Estimation of Directed Acyclic Graphs Through Two-stage Adaptive Lasso for Gene Network Inference.基于两阶段自适应套索法的有向无环图估计在基因网络推断中的应用
J Am Stat Assoc. 2016;111(515):1004-1019. doi: 10.1080/01621459.2016.1142880. Epub 2016 Oct 18.
9
The huge Package for High-dimensional Undirected Graph Estimation in R.R语言中用于高维无向图估计的庞大软件包。
J Mach Learn Res. 2012 Apr;13:1059-1062.
10
TRRUST: a reference database of human transcriptional regulatory interactions.TRRUST:人类转录调控相互作用的参考数据库。
Sci Rep. 2015 Jun 12;5:11432. doi: 10.1038/srep11432.