推断实验设计以进行准确的基因调控网络推断。

Inferring the experimental design for accurate gene regulatory network inference.

作者信息

Seçilmiş Deniz, Hillerton Thomas, Nelander Sven, Sonnhammer Erik L L

机构信息

Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna 17121, Sweden.

Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, SE-75185 Uppsala, Sweden.

出版信息

Bioinformatics. 2021 Oct 25;37(20):3553-3559. doi: 10.1093/bioinformatics/btab367.

DOI:10.1093/bioinformatics/btab367

PMID:33978748

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8545292/

Abstract

MOTIVATION

Accurate inference of gene regulatory interactions is of importance for understanding the mechanisms of underlying biological processes. For gene expression data gathered from targeted perturbations, gene regulatory network (GRN) inference methods that use the perturbation design are the top performing methods. However, the connection between the perturbation design and gene expression can be obfuscated due to problems, such as experimental noise or off-target effects, limiting the methods' ability to reconstruct the true GRN.

RESULTS

In this study, we propose an algorithm, IDEMAX, to infer the effective perturbation design from gene expression data in order to eliminate the potential risk of fitting a disconnected perturbation design to gene expression. We applied IDEMAX to synthetic data from two different data generation tools, GeneNetWeaver and GeneSPIDER, and assessed its effect on the experiment design matrix as well as the accuracy of the GRN inference, followed by application to a real dataset. The results show that our approach consistently improves the accuracy of GRN inference compared to using the intended perturbation design when much of the signal is hidden by noise, which is often the case for real data.

AVAILABILITY AND IMPLEMENTATION

https://bitbucket.org/sonnhammergrni/idemax.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

准确推断基因调控相互作用对于理解潜在生物学过程的机制至关重要。对于从靶向扰动收集的基因表达数据，使用扰动设计的基因调控网络（GRN）推断方法是性能最佳的方法。然而，由于实验噪声或脱靶效应等问题，扰动设计与基因表达之间的联系可能会被混淆，从而限制了这些方法重建真实GRN的能力。

结果

在本研究中，我们提出了一种算法IDEMAX，用于从基因表达数据中推断有效的扰动设计，以消除将不相关的扰动设计拟合到基因表达的潜在风险。我们将IDEMAX应用于来自两种不同数据生成工具GeneNetWeaver和GeneSPIDER的合成数据，并评估其对实验设计矩阵以及GRN推断准确性的影响，随后应用于真实数据集。结果表明，当大部分信号被噪声隐藏时（实际数据通常如此），与使用预期的扰动设计相比，我们的方法始终能提高GRN推断的准确性。