Brazma A, Vilo J, Ukkonen E, Valtonen K
Institute of Mathematics and Computer Science, University of Latvia.
Proc Int Conf Intell Syst Mol Biol. 1997;5:65-74.
We have examined methods and developed a general software tool for finding and analyzing combinations of transcription factor binding sites that occur relatively often in gene upstream regions (putative promoter regions) in the yeast genome. Such frequently occurring combinations may be essential parts of possible promoter classes. The regions upstream to all genes were first isolated from the yeast genome database MIPS using the information in the annotation files of the database. The ones that do not overlap with coding regions were chosen for further studies. Next, all occurrences of the yeast transcription factor binding sites, as given in the IMD database, were located in the genome and in the selected regions in particular. Finally, by using a general purpose data mining software in combination with our own software, which parametrizes the search, we can find the combinations of binding sites that occur in the upstream regions more frequently than would be expected on the basis of the frequency of individual sites. The procedure also finds so-called association rules present in such combinations. The developed tool is available for use through the WWW.
我们研究了多种方法,并开发了一种通用软件工具,用于查找和分析在酵母基因组中基因上游区域(假定的启动子区域)相对频繁出现的转录因子结合位点组合。此类频繁出现的组合可能是潜在启动子类别的重要组成部分。首先利用数据库注释文件中的信息,从酵母基因组数据库MIPS中分离出所有基因的上游区域。选择那些不与编码区域重叠的区域进行进一步研究。接下来,确定IMD数据库中给出的酵母转录因子结合位点在基因组中,尤其是在选定区域中的所有出现位置。最后,通过将通用数据挖掘软件与我们自己的软件相结合(我们的软件对搜索进行参数化),我们能够找到在上游区域中出现频率高于基于单个位点频率预期的结合位点组合。该过程还能找到此类组合中存在的所谓关联规则。所开发的工具可通过万维网使用。