用于神经炎症事件中具有调控基序结合位点的基因表达建模的贝叶斯变量选择

Bayesian variable selection for gene expression modeling with regulatory motif binding sites in neuroinflammatory events.

作者信息

Liu Kuang-Yu, Zhou Xiaobo, Kan Kinhong, Wong Stephen T C

机构信息

HCNR -- Center for Bioinformatics, Harvard Medical School, Boston, Massachusetts 02215, USA.

出版信息

Neuroinformatics. 2006 Winter;4(1):95-117. doi: 10.1385/NI:4:1:95.

DOI:10.1385/NI:4:1:95

PMID:16595861

Abstract

Multiple transcription factors (TFs) coordinately control transcriptional regulation of genes in eukaryotes. Although numerous computational methods focus on the identification of individual TF-binding sites (TFBSs), very few consider the interdependence among these sites. In this article, we studied the relationship between TFBSs and microarray gene expression levels using both family-wise and memberspecific motifs, under various combination of regression models with Bayesian variable selection, as well as motif scoring and sharing conditions, in order to account for the coordination complexity of transcription regulation. We proposed a three-step approach to model the relationship. In the first step, we preprocessed microarray data and used p-values and expression ratios to preselect upregulated and downregulated genes. The second step aimed to identify and score individual TFBSs within DNA sequence of each gene. A method based on the degree of similarity and the number of TFBSs was employed to calculate the score of each TFBS in each gene sequence. In the last step, linear regression and probit regression were used to build a predictive model of gene expression outcomes using these TFBSs as predictors. Given a certain number of predictors to be used, a full search of all possible predictor sets is usually combinatorially prohibitive. Therefore, this article considered the Bayesian variable selection for prediction using either of the regression models. The Bayesian variable selection has been applied in the context of gene selection, missing value estimation, and regulatory motif identification. In our modeling, the regressor was approximated as a linear combination of the TFBSs and a Gibbs sampler was employed to find the strongest TFBSs. We applied these regression models with the Bayesian variable selection on spinal cord injury gene expression data set. These TFs demonstrated intricate regulatory roles either as a family or as individual members in neuroinflammatory events. Our analysis can be applied to create plausible hypotheses for combinatorial regulation by TFBSs and avoiding false-positive candidates in the modeling process at the same time. Such a systematic approach provides the possibility to dissect transcription regulation, from a more comprehensive perspective, through which phenotypical events at cellular and tissue levels are moved forward by molecular events at gene transcription and translation levels.

摘要

多种转录因子（TFs）协同控制真核生物中基因的转录调控。尽管众多计算方法专注于识别单个转录因子结合位点（TFBSs），但很少有方法考虑这些位点之间的相互依赖性。在本文中，我们使用家族特异性和成员特异性基序，在回归模型与贝叶斯变量选择的各种组合以及基序评分和共享条件下，研究了TFBSs与微阵列基因表达水平之间的关系，以解释转录调控的协调复杂性。我们提出了一种三步法来对这种关系进行建模。第一步，我们对微阵列数据进行预处理，并使用p值和表达比率预先选择上调和下调基因。第二步旨在识别每个基因的DNA序列中的单个TFBSs并对其进行评分。采用一种基于相似度和TFBSs数量的方法来计算每个基因序列中每个TFBS的得分。在最后一步中，使用线性回归和概率单位回归，以这些TFBSs作为预测因子构建基因表达结果的预测模型。给定要使用的一定数量的预测因子，对所有可能的预测因子集进行全面搜索通常在组合上是不可行的。因此，本文考虑使用回归模型之一进行预测的贝叶斯变量选择。贝叶斯变量选择已应用于基因选择、缺失值估计和调控基序识别等背景中。在我们的建模中，回归因子被近似为TFBSs的线性组合，并使用吉布斯采样器来找到最强的TFBSs。我们将这些带有贝叶斯变量选择的回归模型应用于脊髓损伤基因表达数据集。这些转录因子在神经炎症事件中作为一个家族或作为个体成员发挥着复杂的调控作用。我们的分析可用于为TFBSs的组合调控创建合理的假设，同时在建模过程中避免假阳性候选。这种系统方法提供了从更全面的角度剖析转录调控的可能性，通过这种方式，细胞和组织水平的表型事件由基因转录和翻译水平的分子事件推动。

相似文献

Bayesian variable selection for gene expression modeling with regulatory motif binding sites in neuroinflammatory events.

Neuroinformatics. 2006 Winter;4(1):95-117. doi: 10.1385/NI:4:1:95.

Regulatory motif finding by logic regression.

Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27.

Molecular and structural considerations of TF-DNA binding for the generation of biologically meaningful and accurate phylogenetic footprinting analysis: the LysR-type transcriptional regulator family as a study model.

BMC Genomics. 2016 Aug 27;17(1):686. doi: 10.1186/s12864-016-3025-3.

Genome-wide prediction of transcriptional regulatory elements of human promoters using gene expression and promoter analysis data.

BMC Bioinformatics. 2006 Jul 4;7:330. doi: 10.1186/1471-2105-7-330.

Identification of DNA regulatory motifs using Bayesian variable selection.

Bioinformatics. 2004 Nov 1;20(16):2553-61. doi: 10.1093/bioinformatics/bth282. Epub 2004 Apr 29.

Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network.

Proc IEEE Comput Soc Bioinform Conf. 2002;1:219-27.

Most of the tight positional conservation of transcription factor binding sites near the transcription start site reflects their co-localization within regulatory modules.

BMC Bioinformatics. 2016 Nov 21;17(1):479. doi: 10.1186/s12859-016-1354-5.

Heterogeneity in DNA multiple alignments: modeling, inference, and applications in motif finding.

Biometrics. 2010 Sep;66(3):694-704. doi: 10.1111/j.1541-0420.2009.01362.x.

Analysis of the association between transcription factor binding site variants and distinct accompanying regulatory motifs in yeast.

Gene. 2012 Jan 10;491(2):237-45. doi: 10.1016/j.gene.2011.08.028. Epub 2011 Sep 16.

Identification and positional distribution analysis of transcription factor binding sites for genes from the wheat fl-cDNA sequences.

Biosci Biotechnol Biochem. 2017 Jun;81(6):1125-1135. doi: 10.1080/09168451.2017.1295803. Epub 2017 Feb 28.

引用本文的文献

Identification of SNP-containing regulatory motifs in the myelodysplastic syndromes model using SNP arrays and gene expression arrays.

Chin J Cancer. 2013 Apr;32(4):170-85. doi: 10.5732/cjc.012.10113. Epub 2013 Jan 18.

本文引用的文献

Assessing computational tools for the discovery of transcription factor binding sites.

Nat Biotechnol. 2005 Jan;23(1):137-44. doi: 10.1038/nbt1053.

Detecting DNA regulatory motifs by incorporating positional trends in information content.

Genome Biol. 2004;5(7):R50. doi: 10.1186/gb-2004-5-7-r50. Epub 2004 Jun 24.

Regulatory motif finding by logic regression.

Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27.

Identification of DNA regulatory motifs using Bayesian variable selection.

Bioinformatics. 2004 Nov 1;20(16):2553-61. doi: 10.1093/bioinformatics/bth282. Epub 2004 Apr 29.

Trans-activators regulating neuronal glucose transporter isoform-3 gene expression in mammalian neurons.

J Biol Chem. 2004 Jun 18;279(25):26768-79. doi: 10.1074/jbc.M402735200. Epub 2004 Mar 30.

Screening anti-inflammatory compounds in injured spinal cord with microarrays: a comparison of bioinformatics analysis approaches.

Physiol Genomics. 2004 Apr 13;17(2):201-14. doi: 10.1152/physiolgenomics.00177.2003.

Missing-value estimation using linear and non-linear regression with Bayesian gene selection.

Bioinformatics. 2003 Nov 22;19(17):2302-7. doi: 10.1093/bioinformatics/btg323.

Searching for statistically significant regulatory modules.

Bioinformatics. 2003 Oct;19 Suppl 2:ii16-25. doi: 10.1093/bioinformatics/btg1054.

Binarization of microarray data on the basis of a mixture model.

Mol Cancer Ther. 2003 Jul;2(7):679-84.

MATCH: A tool for searching transcription factor binding sites in DNA sequences.

Nucleic Acids Res. 2003 Jul 1;31(13):3576-9. doi: 10.1093/nar/gkg585.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于神经炎症事件中具有调控基序结合位点的基因表达建模的贝叶斯变量选择

Bayesian variable selection for gene expression modeling with regulatory motif binding sites in neuroinflammatory events.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献