• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种整合多种基因组规模数据源的图形模型方法。

A graphical model method for integrating multiple sources of genome-scale data.

作者信息

Dvorkin Daniel, Biehs Brian, Kechris Katerina

机构信息

Computational Bioscience Program, University of Colorado School of Medicine, 12801 E. 17th Ave., Aurora, CO 80045–0511, USA.

出版信息

Stat Appl Genet Mol Biol. 2013 Aug;12(4):469-87. doi: 10.1515/sagmb-2012-0051.

DOI:10.1515/sagmb-2012-0051
PMID:23934610
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4867227/
Abstract

Making effective use of multiple data sources is a major challenge in modern bioinformatics. Genome-wide data such as measures of transcription factor binding, gene expression, and sequence conservation, which are used to identify binding regions and genes that are important to major biological processes such as development and disease, can be difficult to use together due to the different biological meanings and statistical distributions of the heterogeneous data types, but each can provide valuable information for understanding the processes under study. Here we present methods for integrating multiple data sources to gain a more complete picture of gene regulation and expression. Our goal is to identify genes and cis-regulatory regions which play specific biological roles. We describe a graphical mixture model approach for data integration, examine the effect of using different model topologies, and discuss methods for evaluating the effectiveness of the models. Model fitting is computationally efficient and produces results which have clear biological and statistical interpretations. The Hedgehog and Dorsal signaling pathways in Drosophila, which are critical in embryonic development, are used as examples.

摘要

有效利用多个数据源是现代生物信息学中的一项重大挑战。全基因组数据,如转录因子结合、基因表达和序列保守性的测量数据,用于识别对发育和疾病等主要生物过程至关重要的结合区域和基因。由于异构数据类型具有不同的生物学意义和统计分布,这些数据很难一起使用,但每种数据都能为理解所研究的过程提供有价值的信息。在此,我们提出整合多个数据源的方法,以更全面地了解基因调控和表达。我们的目标是识别发挥特定生物学作用的基因和顺式调控区域。我们描述了一种用于数据整合的图形混合模型方法,研究了使用不同模型拓扑结构的效果,并讨论了评估模型有效性的方法。模型拟合在计算上效率很高,并且产生的结果具有清晰的生物学和统计学解释。以果蝇中对胚胎发育至关重要的刺猬信号通路和背侧信号通路为例进行说明。

相似文献

1
A graphical model method for integrating multiple sources of genome-scale data.一种整合多种基因组规模数据源的图形模型方法。
Stat Appl Genet Mol Biol. 2013 Aug;12(4):469-87. doi: 10.1515/sagmb-2012-0051.
2
Bayesian hierarchical error model for analysis of gene expression data.用于基因表达数据分析的贝叶斯分层误差模型。
Bioinformatics. 2004 Sep 1;20(13):2016-25. doi: 10.1093/bioinformatics/bth192. Epub 2004 Mar 25.
3
An empirical Bayes approach to inferring large-scale gene association networks.一种用于推断大规模基因关联网络的经验贝叶斯方法。
Bioinformatics. 2005 Mar;21(6):754-64. doi: 10.1093/bioinformatics/bti062. Epub 2004 Oct 12.
4
An empirical Bayes' approach to joint analysis of multiple microarray gene expression studies.一种用于多个微阵列基因表达研究联合分析的经验贝叶斯方法。
Biometrics. 2011 Dec;67(4):1617-26. doi: 10.1111/j.1541-0420.2011.01602.x. Epub 2011 Apr 22.
5
Incorporating prior information via shrinkage: a combined analysis of genome-wide location data and gene expression data.通过收缩法纳入先验信息:全基因组定位数据与基因表达数据的联合分析
Stat Med. 2007 May 10;26(10):2258-75. doi: 10.1002/sim.2703.
6
Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data.通过整合多源生物数据基于网络基序识别转录因子-靶基因关系
BMC Bioinformatics. 2008 Apr 21;9:203. doi: 10.1186/1471-2105-9-203.
7
A GMM-IG framework for selecting genes as expression panel biomarkers.一种用于选择基因作为表达谱生物标志物的 GMM-IG 框架。
Artif Intell Med. 2010 Feb-Mar;48(2-3):75-82. doi: 10.1016/j.artmed.2009.07.006. Epub 2009 Dec 8.
8
Fast Bayesian inference in large Gaussian graphical models.大型高斯图模型中的快速贝叶斯推理。
Biometrics. 2019 Dec;75(4):1288-1298. doi: 10.1111/biom.13064. Epub 2019 May 6.
9
An order estimation based approach to identify response genes for microarray time course data.一种基于顺序估计的方法,用于识别微阵列时间序列数据的响应基因。
Stat Appl Genet Mol Biol. 2012 Dec 14;11(6):/j/sagmb.2012.11.issue-6/1544-6115.1818/1544-6115.1818.xml. doi: 10.1515/1544-6115.1818.
10
CAGER: classification analysis of gene expression regulation using multiple information sources.CAGER:利用多种信息源进行基因表达调控的分类分析
BMC Bioinformatics. 2005 May 12;6:114. doi: 10.1186/1471-2105-6-114.

引用本文的文献

1
Evaluation of hierarchical models for integrative genomic analyses.用于整合基因组分析的分层模型评估。
Bioinformatics. 2016 Mar 1;32(5):738-46. doi: 10.1093/bioinformatics/btv653. Epub 2015 Nov 5.
2
The discordant method: a novel approach for differential correlation.不一致方法:一种用于差异相关性分析的新方法。
Bioinformatics. 2016 Mar 1;32(5):690-6. doi: 10.1093/bioinformatics/btv633. Epub 2015 Oct 31.
3
DNA methylation and childhood asthma in the inner city.城市中心区的DNA甲基化与儿童哮喘

本文引用的文献

1
Unsupervised pattern discovery in human chromatin structure through genomic segmentation.通过基因组分割实现人类染色质结构的无监督模式发现。
Nat Methods. 2012 Mar 18;9(5):473-6. doi: 10.1038/nmeth.1937.
2
A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data.基于 RNA-seq 数据的差异表达基因检测的统计学方法比较。
Am J Bot. 2012 Feb;99(2):248-56. doi: 10.3732/ajb.1100340. Epub 2012 Jan 20.
3
FlyBase 101--the basics of navigating FlyBase.FlyBase101——导航 FlyBase 的基础知识。
J Allergy Clin Immunol. 2015 Jul;136(1):69-80. doi: 10.1016/j.jaci.2015.01.025. Epub 2015 Mar 11.
Nucleic Acids Res. 2012 Jan;40(Database issue):D706-14. doi: 10.1093/nar/gkr1030. Epub 2011 Nov 29.
4
KEGG for integration and interpretation of large-scale molecular data sets.KEGG 用于整合和解释大规模分子数据集。
Nucleic Acids Res. 2012 Jan;40(Database issue):D109-14. doi: 10.1093/nar/gkr988. Epub 2011 Nov 10.
5
Integrating diverse genomic data using gene sets.利用基因集整合多种基因组数据。
Genome Biol. 2011 Oct 21;12(10):R105. doi: 10.1186/gb-2011-12-10-r105.
6
ChIP-Array: combinatory analysis of ChIP-seq/chip and microarray gene expression data to discover direct/indirect targets of a transcription factor.ChIP-Array:ChIP-seq/chip 和基因表达芯片数据的组合分析,以发现转录因子的直接/间接靶标。
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W430-6. doi: 10.1093/nar/gkr332. Epub 2011 May 17.
7
Hedgehog targets in the Drosophila embryo and the mechanisms that generate tissue-specific outputs of Hedgehog signaling.果蝇胚胎中的刺猬目标及其产生刺猬信号组织特异性输出的机制。
Development. 2010 Nov;137(22):3887-98. doi: 10.1242/dev.055871.
8
The UCSC Genome Browser database: update 2011.加州大学圣克鲁兹分校基因组浏览器数据库:2011年更新
Nucleic Acids Res. 2011 Jan;39(Database issue):D876-82. doi: 10.1093/nar/gkq963. Epub 2010 Oct 18.
9
Next-generation genomics: an integrative approach.下一代基因组学:综合方法。
Nat Rev Genet. 2010 Jul;11(7):476-86. doi: 10.1038/nrg2795.
10
Genome-wide identification of hypoxia-inducible factor binding sites and target genes by a probabilistic model integrating transcription-profiling data and in silico binding site prediction.通过整合转录谱数据和计算机结合位点预测的概率模型,对低氧诱导因子结合位点和靶基因进行全基因组鉴定。
Nucleic Acids Res. 2010 Apr;38(7):2332-45. doi: 10.1093/nar/gkp1205. Epub 2010 Jan 8.