• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

高阶校正相关网络中持久批次效应。

Higher-order correction of persistent batch effects in correlation networks.

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States.

Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, United States.

出版信息

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae531.

DOI:10.1093/bioinformatics/btae531
PMID:39226186
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11441315/
Abstract

MOTIVATION

Systems biology analyses often use correlations in gene expression profiles to infer co-expression networks that are then used as input for gene regulatory network inference or to identify functional modules of co-expressed or putatively co-regulated genes. While systematic biases, including batch effects, are known to induce spurious associations and confound differential gene expression analyses (DE), the impact of batch effects on gene co-expression has not been fully explored. Methods have been developed to adjust expression values, ensuring conditional independence of mean and variance from batch or other covariates for each gene, resulting in improved fidelity of DE analysis. However, such adjustments do not address the potential for spurious differential co-expression (DC) between groups. Consequently, uncorrected, artifactual DC can skew the correlation structure, leading to the identification of false, non-biological associations, even when the input data are corrected using standard batch correction.

RESULTS

In this work, we demonstrate the persistence of confounders in covariance after standard batch correction using synthetic and real-world gene expression data examples. We then introduce Co-expression Batch Reduction Adjustment (COBRA), a method for computing a batch-corrected gene co-expression matrix based on estimating a conditional covariance matrix. COBRA estimates a reduced set of parameters expressing the co-expression matrix as a function of the sample covariates, allowing control for continuous and categorical covariates. COBRA is computationally efficient, leveraging the inherently modular structure of genomic data to estimate accurate gene regulatory associations and facilitate functional analysis for high-dimensional genomic data.

AVAILABILITY AND IMPLEMENTATION

COBRA is available under the GLP3 open source license in R and Python in netZoo (https://netzoo.github.io).

摘要

动机

系统生物学分析通常使用基因表达谱中的相关性来推断共表达网络,然后将其用作基因调控网络推断的输入,或识别共表达或推测共调控基因的功能模块。虽然系统偏差,包括批次效应,已知会引起虚假关联并混淆差异基因表达分析 (DE),但批次效应对基因共表达的影响尚未得到充分探索。已经开发了方法来调整表达值,确保每个基因的均值和方差与批次或其他协变量的条件独立性,从而提高 DE 分析的保真度。然而,这种调整并不能解决组间虚假差异共表达 (DC) 的潜在问题。因此,未经校正的、人为的 DC 会扭曲相关结构,导致即使使用标准批次校正校正输入数据,也会识别出虚假的、非生物学的关联。

结果

在这项工作中,我们使用合成和真实的基因表达数据示例,证明了在使用标准批次校正后,协方差中的混杂因素仍然存在。然后,我们引入了共表达批次减少调整 (COBRA),这是一种基于估计条件协方差矩阵来计算批次校正后基因共表达矩阵的方法。COBRA 估计一组减少的参数,将共表达矩阵表示为样本协变量的函数,允许控制连续和分类协变量。COBRA 计算效率高,利用基因组数据固有的模块化结构来估计准确的基因调控关联,并为高维基因组数据提供功能分析。

可用性和实现

COBRA 在 GLP3 开源许可证下以 R 和 Python 的形式在 netZoo(https://netzoo.github.io)中提供。

相似文献

1
Higher-order correction of persistent batch effects in correlation networks.高阶校正相关网络中持久批次效应。
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae531.
2
Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size.针对样本量有限的大型共表达网络的调控推理评估与改进
BMC Syst Biol. 2017 Jun 19;11(1):62. doi: 10.1186/s12918-017-0440-2.
3
Detecting hidden batch factors through data-adaptive adjustment for biological effects.通过数据自适应调整检测生物效应中的隐藏批次因素。
Bioinformatics. 2018 Apr 1;34(7):1141-1147. doi: 10.1093/bioinformatics/btx635.
4
The Network Zoo: a multilingual package for the inference and analysis of gene regulatory networks.网络动物园:用于推断和分析基因调控网络的多语言包。
Genome Biol. 2023 Mar 9;24(1):45. doi: 10.1186/s13059-023-02877-1.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
Identification of regulatory modules in genome scale transcription regulatory networks.在基因组规模转录调控网络中识别调控模块。
BMC Syst Biol. 2017 Dec 15;11(1):140. doi: 10.1186/s12918-017-0493-2.
7
Covariate-dependent negative binomial factor analysis of RNA sequencing data.基于协变量的 RNA 测序数据负二项式因子分析。
Bioinformatics. 2018 Jul 1;34(13):i61-i69. doi: 10.1093/bioinformatics/bty237.
8
Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data.调整 RNA 测序数据中基因表达测量的虚假相关性。
Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad610.
9
Overcoming the impacts of two-step batch effect correction on gene expression estimation and inference.克服两步批处理效应校正对基因表达估计和推断的影响。
Biostatistics. 2023 Jul 14;24(3):635-652. doi: 10.1093/biostatistics/kxab039.
10
Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data.稳健稀疏相关矩阵估计在高通量基因组学数据分析中的应用
Bioinformatics. 2018 Feb 15;34(4):625-634. doi: 10.1093/bioinformatics/btx642.

引用本文的文献

1
CorrAdjust unveils biologically relevant transcriptomic correlations by efficiently eliminating hidden confounders.CorrAdjust通过有效消除隐藏的混杂因素,揭示了生物学上相关的转录组相关性。
Nucleic Acids Res. 2025 May 22;53(10). doi: 10.1093/nar/gkaf444.

本文引用的文献

1
The Network Zoo: a multilingual package for the inference and analysis of gene regulatory networks.网络动物园:用于推断和分析基因调控网络的多语言包。
Genome Biol. 2023 Mar 9;24(1):45. doi: 10.1186/s13059-023-02877-1.
2
DRAGON: Determining Regulatory Associations using Graphical models on multi-Omic Networks.DRAGON:基于多组学网络的图形模型确定调控关系。
Nucleic Acids Res. 2023 Feb 22;51(3):e15. doi: 10.1093/nar/gkac1157.
3
SOX12 Promotes Thyroid Cancer Cell Proliferation and Invasion by Regulating the Expression of POU2F1 and POU3F1.
SOX12 通过调节 POU2F1 和 POU3F1 的表达促进甲状腺癌细胞的增殖和侵袭。
Yonsei Med J. 2022 Jun;63(6):591-600. doi: 10.3349/ymj.2022.63.6.591.
4
An online notebook resource for reproducible inference, analysis and publication of gene regulatory networks.一个用于基因调控网络的可重复推理、分析和发表的在线笔记本资源。
Nat Methods. 2022 May;19(5):511-513. doi: 10.1038/s41592-022-01479-2.
5
recount3: summaries and queries for large-scale RNA-seq expression and splicing.recount3:大规模 RNA-seq 表达和剪接的摘要和查询。
Genome Biol. 2021 Nov 29;22(1):323. doi: 10.1186/s13059-021-02533-6.
6
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.clusterProfiler 4.0:用于解释组学数据的通用富集工具。
Innovation (Camb). 2021 Jul 1;2(3):100141. doi: 10.1016/j.xinn.2021.100141. eCollection 2021 Aug 28.
7
GRAND: a database of gene regulatory network models across human conditions.GRAND:一个跨人类条件的基因调控网络模型数据库。
Nucleic Acids Res. 2022 Jan 7;50(D1):D610-D621. doi: 10.1093/nar/gkab778.
8
: batch effect adjustment for RNA-seq count data.RNA测序计数数据的批次效应调整
NAR Genom Bioinform. 2020 Sep;2(3):lqaa078. doi: 10.1093/nargab/lqaa078. Epub 2020 Sep 21.
9
Estimating Sample-Specific Regulatory Networks.估计特定样本的调控网络。
iScience. 2019 Apr 26;14:226-240. doi: 10.1016/j.isci.2019.03.021. Epub 2019 Mar 28.
10
Evaluating measures of association for single-cell transcriptomics.评估单细胞转录组学关联的度量。
Nat Methods. 2019 May;16(5):381-386. doi: 10.1038/s41592-019-0372-4. Epub 2019 Apr 8.