Suppr超能文献

对统计学上过度富集的DNA基序关联规则进行系统检测。

Systematic detection of statistically overrepresented DNA motif association rules.

作者信息

Lin Jane Marie, Weng Zhiping

机构信息

Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA.

出版信息

Genome Inform. 2006;17(1):124-33.

Abstract

DNA motifs, or cis-elements, are short nucleotide sequence patterns recognized by various transcription factors (TFs). In promoters, these TFs bind in a complex combinatorial manner in order to regulate the expression of a downstream gene. The combinatorial space is frequently large and difficult to manage since vertebrates have thousands of transcription factors and more than 20,000 genes. We introduce a computer program called CAYCE (Combinatorial AnalYsis of Cis-Elements) that systematically detects statistically overrepresented DNA motif association rules independent of Microarray information. CAYCE is an adaptation of the apriori algorithm traditionally used for association rule mining, but offers three significant advancements. (1) It analyzes multiple occurrences of an item, corresponding to multiple TF binding sites, (2) It compares results with a biologically relevant background, and (3), it provides p-values for straightforward statistical interpretation. CAYCE can be easily applied to any item-set data where the investigator is also interested in multiple occurrences of a single item, and/or overrepresentation of association rules compared with a background. Applying CAYCE to human promoters in 1% of the human genome, we discover that motif clusters containing five repetitions of SP1 are the most statistically significant.

摘要

DNA基序,即顺式元件,是各种转录因子(TF)识别的短核苷酸序列模式。在启动子中,这些转录因子以复杂的组合方式结合,以调节下游基因的表达。由于脊椎动物有成千上万种转录因子和超过20000个基因,这种组合空间通常很大且难以管理。我们引入了一个名为CAYCE(顺式元件组合分析)的计算机程序,它能独立于微阵列信息系统地检测统计学上过度富集的DNA基序关联规则。CAYCE是传统上用于关联规则挖掘的先验算法的一种改编,但有三个显著改进。(1)它分析一个项目的多次出现,对应于多个转录因子结合位点;(2)它将结果与生物学相关背景进行比较;(3)它提供p值以进行直接的统计解释。CAYCE可以很容易地应用于任何项目集数据,在这些数据中,研究者也对单个项目的多次出现和/或与背景相比关联规则的过度富集感兴趣。将CAYCE应用于人类基因组1%的人类启动子,我们发现包含五个重复SP1的基序簇在统计学上最显著。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验