Suppr超能文献

用于酵母调控分析和转录因子靶点预测的机器学习

Machine learning for regulatory analysis and transcription factor target prediction in yeast.

作者信息

Holloway Dustin T, Kon Mark, Delisi Charles

机构信息

Molecular Biology Cell Biology and Biochemistry, Boston University, Boston, MA, 02215, USA,

出版信息

Syst Synth Biol. 2007 Mar;1(1):25-46. doi: 10.1007/s11693-006-9003-3.

Abstract

High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps-the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104 Saccharomyces cerevisiae regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying k-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data.

摘要

包括基于芯片的染色质免疫沉淀技术在内的高通量技术,迅速增进了我们对转录图谱(即基因组内调控结合位点的身份和位置)的了解。即便如此,即便在低等真核生物中,位点的完整识别在很大程度上仍未完成。在本文中,我们开发了一种监督学习方法来进行位点识别,使用支持向量机(SVM)来整合26种不同的数据类型。对于一组104个酿酒酵母调节因子,将其基于支持向量机的目标分类方法与使用位置特异性评分矩阵(PSSM)的标准位点识别方法进行比较,结果表明,在特异性和阳性预测值相同的情况下,我们基于支持向量机的目标分类更敏感(73%对20%)。我们已将针对每个转录调节因子的支持向量机分类器应用于酵母基因组中的所有启动子,以获得数千个新目标,目前正在对这些目标进行分析和优化,以降低分类器过度拟合的风险。为了说明问题,我们讨论了几个结果,包括对Gcn4和Rap1的生化途径预测。对于这两种转录因子,支持向量机的预测与已知的调控机制生物学情况匹配良好,并提出了这些因子可能具有的新作用,例如Rap1在调节发酵生长中的功能。我们还研究了YJR

06

0W目标的启动子解链温度曲线,结果表明该转录因子的目标具有潜在独特的物理特性,使其与其他基因区分开来。支持向量机的输出自动提供了对数据集特征进行排序的方法,以识别重要的生物学元件。我们利用这一特性对分类k-mer进行排序,从而重建几种转录因子的已知结合位点,并对表达实验进行排序,确定负责核糖体蛋白基因表达的因子Fhl1活跃的条件。我们可以看到,与平均基因集和阴性基因集的表达相比,Fhl1的目标在所选条件下差异表达。基于支持向量机的分类器为调控网络分析提供了一个强大的框架。对分类器输出的处理可以提供高质量的预测以及对特定转录因子功能的生物学见解。该方法未来的工作将集中于使用特征约简和聚类策略提高预测的准确性和质量。由于仅对酵母中的104个转录因子进行了预测,因此将为其余具有可用结合数据的100个因子构建新的分类器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f72e/2533145/6116684969b9/11693_2006_9003_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验