Suppr超能文献

基于预测的方法来表征哺乳动物基因组中的双向启动子。

Prediction-based approaches to characterize bidirectional promoters in the mammalian genome.

作者信息

Yang Mary Qu, Elnitski Laura L

机构信息

National Human Genome Research Institute, National Institutes of Health, US Department of Health and Human Services, Bethesda, MD 20892, USA.

出版信息

BMC Genomics. 2008;9 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2164-9-S1-S2.

Abstract

BACKGROUND

Machine learning approaches are emerging as a way to discriminate various classes of functional elements. Previous attempts to create Regulatory Potential (RP) scores to discriminate functional DNA from nonfunctional DNA included using Markov models trained to identify sequences from promoters and enhancers from ancestral repeats. We proposed that knowledge gleaned from those methods could be further refined using a multiple class predictor to separate classes of promoter elements from enhancers or nonfunctional DNA.

RESULTS

We extended our previous work, which identified over 5,000 candidate bidirectional promoters in the human genome, to map the orthologous promoter regions in the mouse genome. Our algorithm measured the robustness of evidence provided by the spliced EST annotations and incorporated evidence from annotations of UCSC Known Genes and GenBank mRNA. In preparation for de novo prediction of this promoter type, we examined characteristic features of the dataset as a whole. For instance, bidirectional promoters score very highly among all functional elements for Regulatory Potential Scores. This result was unexpected due to the limited sequence conservation found in these noncoding regions. We demonstrate that bidirectional promoters can be classified apart from other genomic features including non-bidirectional promoters, i.e. those promoters having no nearby upstream genes. Furthermore bidirectional promoters consistently score at the level of very highly conserved functional elements in the genome- developmental enhancers. The high scores are due to sequence-based characteristics within the promoters, not the surrounding exons. These results indicate that high-scoring RP regions can be deconvoluted into various functional classes of genomic elements. Using a multiple class predictor we are able to discriminate bidirectional promoters from enhancers, non-bidirectional promoters, and non-promoter regions on the basis of RP scores and CpG islands.

CONCLUSIONS

We examine orthology at bidirectional promoters, use discriminatory machine learning approaches to differentiate multiple types of promoters from other functional and nonfunctional features in the genome and begin the process of deconvoluting classes of functional regions that score well with RP scores. These types of approaches precede supervised learning techniques to discover unannotated promoter regions.

摘要

背景

机器学习方法正逐渐成为一种区分各类功能元件的方式。以往尝试创建调控潜能(RP)分数以区分功能性DNA和非功能性DNA,包括使用经过训练以从祖先重复序列中识别启动子和增强子序列的马尔可夫模型。我们提出,利用从这些方法中获得的知识,可以通过多类预测器进一步优化,以将启动子元件类别与增强子或非功能性DNA区分开来。

结果

我们扩展了之前的工作,该工作在人类基因组中鉴定出了5000多个候选双向启动子,以绘制小鼠基因组中的直系同源启动子区域。我们的算法测量了剪接EST注释提供的证据的稳健性,并纳入了来自UCSC已知基因和GenBank mRNA注释的证据。为了准备对这种启动子类型进行从头预测,我们整体检查了数据集的特征。例如,双向启动子在所有功能元件的调控潜能分数中得分非常高。由于在这些非编码区域中发现的序列保守性有限,这个结果出乎意料。我们证明,双向启动子可以与其他基因组特征区分开来,包括非双向启动子,即那些附近没有上游基因的启动子。此外,双向启动子在基因组发育增强子中始终处于高度保守的功能元件水平得分。高分是由于启动子内基于序列的特征,而不是周围的外显子。这些结果表明,高分的RP区域可以被分解为基因组元件的各种功能类别。使用多类预测器,我们能够根据RP分数和CpG岛将双向启动子与增强子、非双向启动子和非启动子区域区分开来。

结论

我们研究了双向启动子的直系同源性,使用有区分性的机器学习方法将多种类型的启动子与基因组中的其他功能和非功能特征区分开来,并开始分解在RP分数上得分良好的功能区域类别。这些类型的方法先于监督学习技术来发现未注释的启动子区域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a05c/2386062/8abb3ae9706e/1471-2164-9-S1-S2-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验