Kreps Joel, Budworth Paul, Goff Steve, Wang Ronglin
Torrey Mesa Research Institute, 3115 Merryfield Row, San Diego, CA 92121, USA.
Plant Biotechnol J. 2003 Sep;1(5):345-52. doi: 10.1046/j.1467-7652.2003.00032.x.
A pattern enumeration algorithm named GBSSR has been developed to analyse co-expressed gene groups identified through gene chip expression profiling to search for putative cis-regulatory elements, an important step toward understanding transcriptional factors, quantitative trait loci and gene regulatory networks. Without making any statistical assumptions, this algorithm establishes the frequency distribution of all eligible 6-15 bp strings by extensive bootstrap sampling from an entire genome worth of promoters, enabling those over-represented in a co-expressed gene group to be identified. Using a well-studied plant cold responsive gene system as a positive control, several known cold responsive elements were identified as top ranking candidates, along with some potentially novel ones. A typical analysis of 40 co-expressed genes takes a relatively inexpensive Linux cluster with 32 x 1.4 GHz Intel CPUs about 7 days to process.
一种名为GBSSR的模式枚举算法已被开发出来,用于分析通过基因芯片表达谱鉴定出的共表达基因组,以寻找假定的顺式调控元件,这是迈向理解转录因子、数量性状基因座和基因调控网络的重要一步。该算法无需进行任何统计假设,通过对整个基因组的启动子进行广泛的自助抽样,建立了所有符合条件的6至15个碱基对序列的频率分布,从而能够识别在共表达基因组中过度表达的序列。以一个经过充分研究的植物冷响应基因系统作为阳性对照,鉴定出了几个已知的冷响应元件作为排名靠前的候选元件,以及一些潜在的新元件。对40个共表达基因进行典型分析,使用一个配备32个1.4 GHz英特尔CPU的相对廉价的Linux集群大约需要7天时间来处理。