• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

贝叶斯线性混合模型在基序活性分析中对条件之间的相关性进行研究。

Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis.

机构信息

Data Science, Radboud University, Institute for Computing and Information Sciences, Nijmegen, The Netherlands.

Molecular Developmental Biology, Radboud University, Research Institute for Molecular Life Sciences, Nijmegen, The Netherlands.

出版信息

PLoS One. 2020 May 1;15(5):e0231824. doi: 10.1371/journal.pone.0231824. eCollection 2020.

DOI:10.1371/journal.pone.0231824
PMID:32357166
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7194367/
Abstract

MOTIVATION

Cellular identity and behavior is controlled by complex gene regulatory networks. Transcription factors (TFs) bind to specific DNA sequences to regulate the transcription of their target genes. On the basis of these TF motifs in cis-regulatory elements we can model the influence of TFs on gene expression. In such models of TF motif activity the data is usually modeled assuming a linear relationship between the motif activity and the gene expression level. A commonly used method to model motif influence is based on Ridge Regression. One important assumption of linear regression is the independence between samples. However, if samples are generated from the same cell line, tissue, or other biological source, this assumption may be invalid. This same assumption of independence is also applied to different yet similar experimental conditions, which may also be inappropriate. In theory, the independence assumption between samples could lead to loss in signal detection. Here we investigate whether a Bayesian model that allows for correlations results in more accurate inference of motif activities.

RESULTS

We extend the Ridge Regression to a Bayesian Linear Mixed Model, which allows us to model dependence between different samples. In a simulation study, we investigate the differences between the two model assumptions. We show that our Bayesian Linear Mixed Model implementation outperforms Ridge Regression in a simulation scenario where the noise, which is the signal that can not be explained by TF motifs, is uncorrelated. However, we demonstrate that there is no such gain in performance if the noise has a similar covariance structure over samples as the signal that can be explained by motifs. We give a mathematical explanation to why this is the case. Using four representative real datasets we show that at most ∼​40% of the signal is explained by motifs using the linear model. With these data there is no advantage to using the Bayesian Linear Mixed Model, due to the similarity of the covariance structure.

AVAILABILITY & IMPLEMENTATION: The project implementation is available at https://github.com/Sim19/SimGEXPwMotifs.

摘要

动机

细胞的特性和行为受复杂的基因调控网络控制。转录因子 (TFs) 与特定的 DNA 序列结合,以调节其靶基因的转录。基于顺式调控元件中的这些 TF 基序,我们可以模拟 TFs 对基因表达的影响。在这些 TF 基序活性模型中,数据通常是在假设基序活性与基因表达水平之间存在线性关系的情况下进行建模的。一种常用的建模基序影响的方法是基于岭回归。线性回归的一个重要假设是样本之间的独立性。然而,如果样本是从同一个细胞系、组织或其他生物来源产生的,那么这一假设可能是无效的。这种独立性的假设也适用于不同但相似的实验条件,这可能也不合适。从理论上讲,样本之间的独立性假设可能导致信号检测的丢失。在这里,我们研究了允许相关性的贝叶斯模型是否会导致基序活性的更准确推断。

结果

我们将岭回归扩展到贝叶斯线性混合模型,该模型允许我们对不同样本之间的相关性进行建模。在模拟研究中,我们研究了两种模型假设之间的差异。我们表明,在噪声(即不能用 TF 基序解释的信号)不相关的模拟场景中,我们的贝叶斯线性混合模型实现优于岭回归。然而,我们证明,如果噪声在样本之间具有与可以用基序解释的信号相似的协方差结构,则不会有性能增益。我们给出了一个数学解释,说明为什么会这样。使用四个具有代表性的真实数据集,我们表明,使用线性模型最多可以解释约 40%的信号。由于协方差结构的相似性,对于这些数据,使用贝叶斯线性混合模型没有优势。

可用性和实现

项目实现可在 https://github.com/Sim19/SimGEXPwMotifs 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ad1/7194367/6b80eaa9e81b/pone.0231824.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ad1/7194367/f69821e38ae9/pone.0231824.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ad1/7194367/76efd62df191/pone.0231824.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ad1/7194367/a4073da23083/pone.0231824.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ad1/7194367/395d69aeb37d/pone.0231824.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ad1/7194367/0a265428bfb3/pone.0231824.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ad1/7194367/6b80eaa9e81b/pone.0231824.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ad1/7194367/f69821e38ae9/pone.0231824.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ad1/7194367/76efd62df191/pone.0231824.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ad1/7194367/a4073da23083/pone.0231824.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ad1/7194367/395d69aeb37d/pone.0231824.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ad1/7194367/0a265428bfb3/pone.0231824.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ad1/7194367/6b80eaa9e81b/pone.0231824.g006.jpg

相似文献

1
Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis.贝叶斯线性混合模型在基序活性分析中对条件之间的相关性进行研究。
PLoS One. 2020 May 1;15(5):e0231824. doi: 10.1371/journal.pone.0231824. eCollection 2020.
2
Bayesian variable selection for gene expression modeling with regulatory motif binding sites in neuroinflammatory events.用于神经炎症事件中具有调控基序结合位点的基因表达建模的贝叶斯变量选择
Neuroinformatics. 2006 Winter;4(1):95-117. doi: 10.1385/NI:4:1:95.
3
An equilibrium partitioning model connecting gene expression and cis-motif content.一个连接基因表达与顺式基序含量的平衡分配模型。
Bioinformatics. 2006 Jul 15;22(14):e368-74. doi: 10.1093/bioinformatics/btl253.
4
ChIP-GSM: Inferring active transcription factor modules to predict functional regulatory elements.ChIP-GSM:推断活性转录因子模块以预测功能调控元件。
PLoS Comput Biol. 2021 Jul 22;17(7):e1009203. doi: 10.1371/journal.pcbi.1009203. eCollection 2021 Jul.
5
Factor analysis for gene regulatory networks and transcription factor activity profiles.基因调控网络和转录因子活性谱的因子分析
BMC Bioinformatics. 2007 Feb 23;8:61. doi: 10.1186/1471-2105-8-61.
6
PlantPAN3.0: a new and updated resource for reconstructing transcriptional regulatory networks from ChIP-seq experiments in plants.PlantPAN3.0:一个新的和更新的资源,用于从植物的 ChIP-seq 实验中重建转录调控网络。
Nucleic Acids Res. 2019 Jan 8;47(D1):D1155-D1163. doi: 10.1093/nar/gky1081.
7
A novel motif-discovery algorithm to identify co-regulatory motifs in large transcription factor and microRNA co-regulatory networks in human.一种用于在人类大型转录因子和微小RNA共调控网络中识别共调控基序的新型基序发现算法。
Bioinformatics. 2015 Jul 15;31(14):2348-55. doi: 10.1093/bioinformatics/btv159. Epub 2015 Mar 18.
8
MAGGIE: leveraging genetic variation to identify DNA sequence motifs mediating transcription factor binding and function.麦琪:利用遗传变异来识别介导转录因子结合和功能的 DNA 序列基序。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i84-i92. doi: 10.1093/bioinformatics/btaa476.
9
Efficient inference for sparse latent variable models of transcriptional regulation.转录调控稀疏潜在变量模型的高效推断。
Bioinformatics. 2017 Dec 1;33(23):3776-3783. doi: 10.1093/bioinformatics/btx508.
10
Predicting genetic regulatory response using classification.使用分类方法预测基因调控反应。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i232-40. doi: 10.1093/bioinformatics/bth923.

本文引用的文献

1
GENCODE reference annotation for the human and mouse genomes.GENCODE 人类和小鼠基因组参考注释。
Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773. doi: 10.1093/nar/gky955.
2
Improving the value of public RNA-seq expression data by phenotype prediction.通过表型预测提高公共 RNA-seq 表达数据的价值。
Nucleic Acids Res. 2018 May 18;46(9):e54. doi: 10.1093/nar/gky102.
3
The Human Transcription Factors.人类转录因子。
Cell. 2018 Feb 8;172(4):650-665. doi: 10.1016/j.cell.2018.01.029.
4
The benefits of immunotherapy combinations.免疫疗法联合治疗的益处。
Nature. 2017 Dec 21;552(7685):S67-S69. doi: 10.1038/d41586-017-08702-7.
5
Integrated analysis of motif activity and gene expression changes of transcription factors.转录因子基序活性和基因表达变化的综合分析。
Genome Res. 2018 Feb;28(2):243-255. doi: 10.1101/gr.227231.117. Epub 2017 Dec 12.
6
Data Portal for the Library of Integrated Network-based Cellular Signatures (LINCS) program: integrated access to diverse large-scale cellular perturbation response data.LINCS 计划的细胞信号网络综合数据库数据门户:对多样化大规模细胞扰动反应数据的综合访问。
Nucleic Acids Res. 2018 Jan 4;46(D1):D558-D566. doi: 10.1093/nar/gkx1063.
7
recount workflow: Accessing over 70,000 human RNA-seq samples with Bioconductor.叙述工作流程:使用Bioconductor访问超过70000个人类RNA测序样本。
F1000Res. 2017 Aug 24;6:1558. doi: 10.12688/f1000research.12223.1. eCollection 2017.
8
A pathology atlas of the human cancer transcriptome.人类癌症转录组病理学图谱。
Science. 2017 Aug 18;357(6352). doi: 10.1126/science.aan2507.
9
Reproducible RNA-seq analysis using recount2.使用recount2进行可重复的RNA测序分析。
Nat Biotechnol. 2017 Apr 11;35(4):319-321. doi: 10.1038/nbt.3838.
10
Role of RUNX1 in hematological malignancies.RUNX1在血液系统恶性肿瘤中的作用。
Blood. 2017 Apr 13;129(15):2070-2082. doi: 10.1182/blood-2016-10-687830. Epub 2017 Feb 8.