Crowley E M, Roeder K, Bina M
Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213-3890, USA.
J Mol Biol. 1997 Apr 25;268(1):8-14. doi: 10.1006/jmbi.1997.0965.
In addition to genes, chromosomal DNA contains sequences that serve as signals for turning on and off gene expression. These signals are thought to be distributed as clusters in the regulatory regions of genes. We develop a Bayesian model that views locating regulatory regions in genomic DNA as a change-point problem, with the beginning of regulatory and non-regulatory regions corresponding to the change points. The model is based on a hidden Markov chain. The data consist of nucleotide positions of protein-binding elements in a genomic DNA sequence. These positions are identified using a reference catalogue containing elements that interact with transcription factors implicated in controlling the expression of protein-encoding genes. Among the protein-binding elements in a genomic DNA sequence, the statistical model automatically selects those that tend to predict regulatory regions. We test the model using viral sequences that include known regulatory regions and provide the results obtained for human genomic DNA corresponding to the beta globin locus on chromosome 11.
除了基因,染色体DNA还包含作为开启和关闭基因表达信号的序列。这些信号被认为以簇的形式分布在基因的调控区域。我们开发了一种贝叶斯模型,该模型将在基因组DNA中定位调控区域视为一个变点问题,调控区域和非调控区域的起始对应于变点。该模型基于一个隐马尔可夫链。数据由基因组DNA序列中蛋白质结合元件的核苷酸位置组成。这些位置是使用一个参考目录确定的,该目录包含与参与控制蛋白质编码基因表达的转录因子相互作用的元件。在基因组DNA序列中的蛋白质结合元件中,统计模型会自动选择那些倾向于预测调控区域的元件。我们使用包含已知调控区域的病毒序列对该模型进行测试,并提供了对应于11号染色体上β珠蛋白基因座的人类基因组DNA的测试结果。