Suppr超能文献

人类启动子序列中过度代表性词汇的统计分析。

Statistical analysis of over-represented words in human promoter sequences.

作者信息

Mariño-Ramírez Leonardo, Spouge John L, Kanga Gavin C, Landsman David

机构信息

Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, MSC 6075 Bethesda, MD 20894-6075, USA.

出版信息

Nucleic Acids Res. 2004 Feb 12;32(3):949-58. doi: 10.1093/nar/gkh246. Print 2004.

Abstract

The identification and characterization of regulatory sequence elements in the proximal promoter region of a gene can be facilitated by knowing the precise location of the transcriptional start site (TSS). Using known TSSs from over 5700 different human full-length cDNAs, this study extracted a set of 4737 distinct putative promoter regions (PPRs) from the human genome. Each PPR consisted of nucleotides from -2000 to +1000 bp, relative to the corresponding TSS. Since many regulatory regions contain short, highly conserved strings of less than 10 nucleotides, we counted eight-letter words within the PPRs, using z-scores and other related statistics to evaluate their over- and under-representation. Several over-represented eight-letter words have known biological functions described in the eukaryotic transcription factor database TRANSFAC; however, many did not. Besides calculating a P-value with the standard normal approximation associated with z-scores, we used two extra statistical controls to evaluate the significance of over-represented words. These controls have important implications for evaluating over- and under-represented words with z-scores.

摘要

了解转录起始位点(TSS)的精确位置有助于识别和表征基因近端启动子区域中的调控序列元件。本研究利用来自5700多种不同人类全长cDNA的已知TSS,从人类基因组中提取了一组4737个不同的假定启动子区域(PPR)。每个PPR相对于相应的TSS,由-2000至+1000 bp的核苷酸组成。由于许多调控区域包含少于10个核苷酸的短的、高度保守的序列,我们在PPR内统计了八个字母的单词,使用z分数和其他相关统计量来评估它们的过度出现和不足出现情况。几个过度出现的八个字母的单词在真核转录因子数据库TRANSFAC中有已知的生物学功能描述;然而,许多没有。除了用与z分数相关的标准正态近似计算P值外,我们还使用了另外两种统计对照来评估过度出现的单词的显著性。这些对照对于用z分数评估过度出现和不足出现的单词具有重要意义。

相似文献

4
Conserved short sequences in promoter regions of human genome.人类基因组启动子区域中的保守短序列。
J Biomol Struct Dyn. 2010 Apr;27(5):599-610. doi: 10.1080/07391102.2010.10508574.

引用本文的文献

2

本文引用的文献

10
Comprehensive analysis of CpG islands in human chromosomes 21 and 22.对人类21号和22号染色体上CpG岛的综合分析。
Proc Natl Acad Sci U S A. 2002 Mar 19;99(6):3740-5. doi: 10.1073/pnas.052410099. Epub 2002 Mar 12.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验