Suppr超能文献

使用广义线性混合模型进行成对差异表达基因推断。

Inference of differentially expressed genes using generalized linear mixed models in a pairwise fashion.

机构信息

Laboratório de Bioinformática, Laboratório Nacional de Computação Científica, Petrópolis, Rio de Janeiro, Brazil.

Centro de Estudo em Telecomunicações, Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro, Brazil.

出版信息

PeerJ. 2023 Apr 3;11:e15145. doi: 10.7717/peerj.15145. eCollection 2023.

Abstract

BACKGROUND

Technological advances involving RNA-Seq and Bioinformatics allow quantifying the transcriptional levels of genes in cells, tissues, and cell lines, permitting the identification of Differentially Expressed Genes (DEGs). DESeq2 and edgeR are well-established computational tools used for this purpose and they are based upon generalized linear models (GLMs) that consider only fixed effects in modeling. However, the inclusion of random effects reduces the risk of missing potential DEGs that may be essential in the context of the biological phenomenon under investigation. The generalized linear mixed models (GLMM) can be used to include both effects.

METHODS

We present DEGRE (Differentially Expressed Genes with Random Effects), a user-friendly tool capable of inferring DEGs where fixed and random effects on individuals are considered in the experimental design of RNA-Seq research. DEGRE preprocesses the raw matrices before fitting GLMMs on the genes and the derived regression coefficients are analyzed using the Wald statistical test. DEGRE offers the Benjamini-Hochberg or Bonferroni techniques for -value adjustment.

RESULTS

The datasets used for DEGRE assessment were simulated with known identification of DEGs. These have fixed effects, and the random effects were estimated and inserted to measure the impact of experimental designs with high biological variability. For DEGs' inference, preprocessing effectively prepares the data and retains overdispersed genes. The biological coefficient of variation is inferred from the counting matrices to assess variability before and after the preprocessing. The DEGRE is computationally validated through its performance by the simulation of counting matrices, which have biological variability related to fixed and random effects. DEGRE also provides improved assessment measures for detecting DEGs in cases with higher biological variability. We show that the preprocessing established here effectively removes technical variation from those matrices. This tool also detects new potential candidate DEGs in the transcriptome data of patients with bipolar disorder, presenting a promising tool to detect more relevant genes.

CONCLUSIONS

DEGRE provides data preprocessing and applies GLMMs for DEGs' inference. The preprocessing allows efficient remotion of genes that could impact the inference. Also, the computational and biological validation of DEGRE has shown to be promising in identifying possible DEGs in experiments derived from complex experimental designs. This tool may help handle random effects on individuals in the inference of DEGs and presents a potential for discovering new interesting DEGs for further biological investigation.

摘要

背景

涉及 RNA-Seq 和生物信息学的技术进步允许定量细胞、组织和细胞系中基因的转录水平,从而鉴定差异表达基因 (DEGs)。DESeq2 和 edgeR 是用于此目的的成熟计算工具,它们基于仅在建模中考虑固定效应的广义线性模型 (GLM)。然而,包含随机效应可以降低错过潜在 DEG 的风险,这些潜在 DEG 在研究中的生物学现象中可能是必不可少的。广义线性混合模型 (GLMM) 可用于同时包含这两种效应。

方法

我们提出了 DEGRE(具有随机效应的差异表达基因),这是一种用户友好的工具,能够推断出在 RNA-Seq 研究的实验设计中考虑个体固定和随机效应的 DEGs。DEGRE 在对基因进行 GLMM 拟合之前预处理原始矩阵,并使用 Wald 统计检验分析得出的回归系数。DEGRE 提供了 Benjamini-Hochberg 或 Bonferroni 技术进行 - 值调整。

结果

用于 DEGRE 评估的数据集是使用已知 DEG 识别进行模拟的。这些数据集具有固定效应,并且估计了随机效应并插入其中,以衡量具有高生物学变异性的实验设计的影响。对于 DEGs 的推断,预处理有效地准备数据并保留过度分散的基因。通过模拟与固定和随机效应相关的生物变异性的计数矩阵来推断生物系数的变异性。DEGRE 通过对具有相关生物变异性的计数矩阵的模拟来验证其性能,从而对其进行计算验证。DEGRE 还提供了改进的评估措施,用于检测具有更高生物学变异性的情况下的 DEGs。我们表明,这里建立的预处理有效地从这些矩阵中去除了技术变异。该工具还在双相情感障碍患者的转录组数据中检测到新的潜在候选 DEGs,为检测更相关的基因提供了有前景的工具。

结论

DEGRE 提供数据预处理和用于 DEGs 推断的广义线性混合模型。预处理允许有效地去除可能影响推断的基因。此外,DEGRE 的计算和生物学验证已证明在识别复杂实验设计衍生实验中的可能 DEGs 方面具有前景。该工具可以帮助在 DEGs 的推断中处理个体的随机效应,并为发现进一步生物学研究的新有趣 DEGs 提供潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35bc/10078460/1aa658044dc3/peerj-11-15145-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验