• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于预测和分析基因调控反应的基于分类的框架。

A classification-based framework for predicting and analyzing gene regulatory response.

作者信息

Kundaje Anshul, Middendorf Manuel, Shah Mihir, Wiggins Chris H, Freund Yoav, Leslie Christina

机构信息

Department of Computer Science, Columbia University, New York, NY 10027, USA.

出版信息

BMC Bioinformatics. 2006 Mar 20;7 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2105-7-S1-S5.

DOI:10.1186/1471-2105-7-S1-S5
PMID:16723008
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1810316/
Abstract

BACKGROUND

We have recently introduced a predictive framework for studying gene transcriptional regulation in simpler organisms using a novel supervised learning algorithm called GeneClass. GeneClass is motivated by the hypothesis that in model organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular microarray experiment based on the presence of binding site subsequences ("motifs") in the gene's regulatory region and the expression levels of regulators such as transcription factors in the experiment ("parents"). GeneClass formulates the learning task as a classification problem--predicting +1 and -1 labels corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. Using the Adaboost algorithm, GeneClass learns a prediction function in the form of an alternating decision tree, a margin-based generalization of a decision tree.

METHODS

In the current work, we introduce a new, robust version of the GeneClass algorithm that increases stability and computational efficiency, yielding a more scalable and reliable predictive model. The improved stability of the prediction tree enables us to introduce a detailed post-processing framework for biological interpretation, including individual and group target gene analysis to reveal condition-specific regulation programs and to suggest signaling pathways. Robust GeneClass uses a novel stabilized variant of boosting that allows a set of correlated features, rather than single features, to be included at nodes of the tree; in this way, biologically important features that are correlated with the single best feature are retained rather than decorrelated and lost in the next round of boosting. Other computational developments include fast matrix computation of the loss function for all features, allowing scalability to large datasets, and the use of abstaining weak rules, which results in a more shallow and interpretable tree. We also show how to incorporate genome-wide protein-DNA binding data from ChIP chip experiments into the GeneClass algorithm, and we use an improved noise model for gene expression data.

RESULTS

Using the improved scalability of Robust GeneClass, we present larger scale experiments on a yeast environmental stress dataset, training and testing on all genes and using a comprehensive set of potential regulators. We demonstrate the improved stability of the features in the learned prediction tree, and we show the utility of the post-processing framework by analyzing two groups of genes in yeast--the protein chaperones and a set of putative targets of the Nrg1 and Nrg2 transcription factors--and suggesting novel hypotheses about their transcriptional and post-transcriptional regulation. Detailed results and Robust GeneClass source code is available for download from http://www.cs.columbia.edu/compbio/robust-geneclass.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc41/1810316/c2e0dda5627f/1471-2105-7-S1-S5-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc41/1810316/0c37ba2b695c/1471-2105-7-S1-S5-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc41/1810316/90ad0a4162a3/1471-2105-7-S1-S5-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc41/1810316/febe26bf6795/1471-2105-7-S1-S5-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc41/1810316/62d2f3020296/1471-2105-7-S1-S5-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc41/1810316/5baa26de7806/1471-2105-7-S1-S5-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc41/1810316/c2e0dda5627f/1471-2105-7-S1-S5-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc41/1810316/0c37ba2b695c/1471-2105-7-S1-S5-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc41/1810316/90ad0a4162a3/1471-2105-7-S1-S5-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc41/1810316/febe26bf6795/1471-2105-7-S1-S5-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc41/1810316/62d2f3020296/1471-2105-7-S1-S5-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc41/1810316/5baa26de7806/1471-2105-7-S1-S5-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc41/1810316/c2e0dda5627f/1471-2105-7-S1-S5-6.jpg
摘要

背景

我们最近引入了一种预测框架,用于使用一种名为GeneClass的新型监督学习算法研究简单生物体中的基因转录调控。GeneClass的灵感来自于这样一种假设,即在酿酒酵母等模式生物中,我们可以基于基因调控区域中结合位点子序列(“基序”)的存在以及实验中调节因子(如转录因子)的表达水平(“亲本”),学习一种决策规则,以预测特定微阵列实验中基因是上调还是下调。GeneClass将学习任务表述为一个分类问题——预测对应于微阵列测量中超出生物学和测量噪声水平的上调和下调的+1和-1标签。使用Adaboost算法,GeneClass学习以交替决策树形式的预测函数,这是决策树基于边际的推广。

方法

在当前工作中,我们引入了GeneClass算法的一个新的、稳健的版本,该版本提高了稳定性和计算效率,产生了一个更具可扩展性和可靠性的预测模型。预测树稳定性的提高使我们能够引入一个用于生物学解释的详细后处理框架,包括个体和组目标基因分析,以揭示特定条件下的调控程序并提出信号通路。稳健的GeneClass使用一种新颖的稳定化的增强变体,该变体允许在树的节点处包含一组相关特征,而不是单个特征;通过这种方式,与单个最佳特征相关的生物学上重要的特征得以保留,而不是在下一轮增强中去相关并丢失。其他计算方面的进展包括对所有特征的损失函数进行快速矩阵计算,从而实现对大型数据集的可扩展性,以及使用弃权弱规则,这会产生一个更浅且更易于解释的树。我们还展示了如何将来自ChIP芯片实验的全基因组蛋白质-DNA结合数据纳入GeneClass算法,并对基因表达数据使用了改进的噪声模型。

结果

利用稳健的GeneClass提高的可扩展性,我们在酵母环境应激数据集上进行了更大规模的实验,对所有基因进行训练和测试,并使用了一组全面的潜在调节因子。我们展示了学习到的预测树中特征稳定性的提高,并通过分析酵母中的两组基因——蛋白质伴侣以及Nrg1和Nrg2转录因子的一组假定靶标——展示了后处理框架的效用,并提出了关于它们转录和转录后调控的新假设。详细结果和稳健的GeneClass源代码可从http://www.cs.columbia.edu/compbio/robust-geneclass下载。

相似文献

1
A classification-based framework for predicting and analyzing gene regulatory response.一种用于预测和分析基因调控反应的基于分类的框架。
BMC Bioinformatics. 2006 Mar 20;7 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2105-7-S1-S5.
2
Predicting genetic regulatory response using classification.使用分类方法预测基因调控反应。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i232-40. doi: 10.1093/bioinformatics/bth923.
3
CAGER: classification analysis of gene expression regulation using multiple information sources.CAGER:利用多种信息源进行基因表达调控的分类分析
BMC Bioinformatics. 2005 May 12;6:114. doi: 10.1186/1471-2105-6-114.
4
An ensemble learning approach to reverse-engineering transcriptional regulatory networks from time-series gene expression data.一种从时间序列基因表达数据反向构建转录调控网络的集成学习方法。
BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S8. doi: 10.1186/1471-2164-10-S1-S8.
5
Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data.通过mRNA表达和转录因子结合数据的整合建模来定义转录网络。
BMC Bioinformatics. 2004 Mar 18;5:31. doi: 10.1186/1471-2105-5-31.
6
Regulatory motif finding by logic regression.通过逻辑回归进行调控基序发现。
Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27.
7
Using local gene expression similarities to discover regulatory binding site modules.利用局部基因表达相似性发现调控结合位点模块。
BMC Bioinformatics. 2006 Nov 17;7:505. doi: 10.1186/1471-2105-7-505.
8
Properly defining the targets of a transcription factor significantly improves the computational identification of cooperative transcription factor pairs in yeast.正确定义转录因子的靶标可显著提高酵母中协同转录因子对的计算识别能力。
BMC Genomics. 2015;16 Suppl 12(Suppl 12):S10. doi: 10.1186/1471-2164-16-S12-S10. Epub 2015 Dec 9.
9
A predictive model of the oxygen and heme regulatory network in yeast.酵母中氧气和血红素调节网络的预测模型。
PLoS Comput Biol. 2008 Nov;4(11):e1000224. doi: 10.1371/journal.pcbi.1000224. Epub 2008 Nov 14.
10
A bi-dimensional regression tree approach to the modeling of gene expression regulation.一种用于基因表达调控建模的二维回归树方法。
Bioinformatics. 2006 Feb 1;22(3):332-40. doi: 10.1093/bioinformatics/bti792. Epub 2005 Nov 22.

引用本文的文献

1
Predicting gene expression in the human malaria parasite Plasmodium falciparum using histone modification, nucleosome positioning, and 3D localization features.利用组蛋白修饰、核小体定位和 3D 定位特征预测人类疟原虫 Plasmodium falciparum 中的基因表达。
PLoS Comput Biol. 2019 Sep 11;15(9):e1007329. doi: 10.1371/journal.pcbi.1007329. eCollection 2019 Sep.
2
LPRP: A Gene-Gene Interaction Network Construction Algorithm and Its Application in Breast Cancer Data Analysis.LPRP:一种基因-基因交互网络构建算法及其在乳腺癌数据分析中的应用。
Interdiscip Sci. 2018 Mar;10(1):131-142. doi: 10.1007/s12539-016-0185-4. Epub 2016 Sep 17.
3

本文引用的文献

1
Repressors Nrg1 and Nrg2 regulate a set of stress-responsive genes in Saccharomyces cerevisiae.阻遏蛋白Nrg1和Nrg2调控酿酒酵母中的一组应激反应基因。
Eukaryot Cell. 2005 Nov;4(11):1882-91. doi: 10.1128/EC.4.11.1882-1891.2005.
2
Predicting gene expression from sequence.从序列预测基因表达。
Cell. 2004 Apr 16;117(2):185-98. doi: 10.1016/s0092-8674(04)00304-6.
3
Genome-wide discovery of transcriptional modules from DNA sequence and gene expression.从DNA序列和基因表达中进行全基因组转录模块发现
An unsupervised approach to predict functional relations between genes based on expression data.
一种基于表达数据预测基因间功能关系的无监督方法。
Biomed Res Int. 2014;2014:154594. doi: 10.1155/2014/154594. Epub 2014 Mar 31.
4
Biomedical informatics and translational medicine.生物医学信息学与转化医学。
J Transl Med. 2010 Feb 26;8:22. doi: 10.1186/1479-5876-8-22.
5
A predictive model of the oxygen and heme regulatory network in yeast.酵母中氧气和血红素调节网络的预测模型。
PLoS Comput Biol. 2008 Nov;4(11):e1000224. doi: 10.1371/journal.pcbi.1000224. Epub 2008 Nov 14.
Bioinformatics. 2003;19 Suppl 1:i273-82. doi: 10.1093/bioinformatics/btg1038.
4
Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data.模块网络:从基因表达数据中识别调控模块及其特定条件下的调控因子。
Nat Genet. 2003 Jun;34(2):166-76. doi: 10.1038/ng1165.
5
Transcriptional regulatory networks in Saccharomyces cerevisiae.酿酒酵母中的转录调控网络。
Science. 2002 Oct 25;298(5594):799-804. doi: 10.1126/science.1075090.
6
Revealing modular organization in the yeast transcriptional network.揭示酵母转录网络中的模块化组织。
Nat Genet. 2002 Aug;31(4):370-7. doi: 10.1038/ng941. Epub 2002 Jul 22.
7
Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p.基因组对DNA损伤剂的表达反应以及酵母ATR同源物Mec1p的调控作用。
Mol Biol Cell. 2001 Oct;12(10):2987-3003. doi: 10.1091/mbc.12.10.2987.
8
Identifying regulatory networks by combinatorial analysis of promoter elements.通过启动子元件的组合分析识别调控网络。
Nat Genet. 2001 Oct;29(2):153-9. doi: 10.1038/ng724.
9
Inferring subnetworks from perturbed expression profiles.从受干扰的表达谱中推断子网。
Bioinformatics. 2001;17 Suppl 1:S215-24. doi: 10.1093/bioinformatics/17.suppl_1.s215.
10
Phosphorylation of serine 230 promotes inducible transcriptional activity of heat shock factor 1.丝氨酸230的磷酸化促进热休克因子1的诱导型转录活性。
EMBO J. 2001 Jul 16;20(14):3800-10. doi: 10.1093/emboj/20.14.3800.