转录组数据不足以控制调控网络推断中的假发现。

Transcriptome data are insufficient to control false discoveries in regulatory network inference.

机构信息

Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA.

Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA; Institute for Cell Engineering, Johns Hopkins Medicine, Baltimore, MD, USA; Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA.

出版信息

Cell Syst. 2024 Aug 21;15(8):709-724.e13. doi: 10.1016/j.cels.2024.07.006.

DOI:10.1016/j.cels.2024.07.006

PMID:39173585

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11642480/

Abstract

Inference of causal transcriptional regulatory networks (TRNs) from transcriptomic data suffers notoriously from false positives. Approaches to control the false discovery rate (FDR), for example, via permutation, bootstrapping, or multivariate Gaussian distributions, suffer from several complications: difficulty in distinguishing direct from indirect regulation, nonlinear effects, and causal structure inference requiring "causal sufficiency," meaning experiments that are free of any unmeasured, confounding variables. Here, we use a recently developed statistical framework, model-X knockoffs, to control the FDR while accounting for indirect effects, nonlinear dose-response, and user-provided covariates. We adjust the procedure to estimate the FDR correctly even when measured against incomplete gold standards. However, benchmarking against chromatin immunoprecipitation (ChIP) and other gold standards reveals higher observed than reported FDR. This indicates that unmeasured confounding is a major driver of FDR in TRN inference. A record of this paper's transparent peer review process is included in the supplemental information.

摘要

从转录组数据推断因果转录调控网络（TRN）存在明显的假阳性问题。通过置换、自举或多元高斯分布等方法来控制假发现率（FDR），存在几个复杂问题：难以区分直接调控和间接调控、非线性效应，以及需要“因果充分性”的因果结构推断，这意味着实验中不能存在任何未测量的、混杂的变量。在这里，我们使用最近开发的统计框架——模型-X 置换，在考虑间接效应、非线性剂量反应和用户提供的协变量的同时，控制 FDR。我们调整了程序，即使在针对不完整金标准的情况下，也能正确估计 FDR。然而，与染色质免疫沉淀（ChIP）和其他金标准的基准测试显示，观察到的 FDR 高于报告的 FDR。这表明未测量的混杂是 TRN 推断中 FDR 的主要驱动因素。本文的透明同行评审过程记录包含在补充信息中。