Indian Institute of Science, Bangalore 560 012, India.
Plant Physiol. 2011 Jul;156(3):1300-15. doi: 10.1104/pp.110.167809. Epub 2011 Apr 29.
The cis-regulatory regions on DNA serve as binding sites for proteins such as transcription factors and RNA polymerase. The combinatorial interaction of these proteins plays a crucial role in transcription initiation, which is an important point of control in the regulation of gene expression. We present here an analysis of the performance of an in silico method for predicting cis-regulatory regions in the plant genomes of Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) on the basis of free energy of DNA melting. For protein-coding genes, we achieve recall and precision of 96% and 42% for Arabidopsis and 97% and 31% for rice, respectively. For noncoding RNA genes, the program gives recall and precision of 94% and 75% for Arabidopsis and 95% and 90% for rice, respectively. Moreover, 96% of the false-positive predictions were located in noncoding regions of primary transcripts, out of which 20% were found in the first intron alone, indicating possible regulatory roles. The predictions for orthologous genes from the two genomes showed a good correlation with respect to prediction scores and promoter organization. Comparison of our results with an existing program for promoter prediction in plant genomes indicates that our method shows improved prediction capability.
DNA 上的顺式调控区充当蛋白质(如转录因子和 RNA 聚合酶)的结合位点。这些蛋白质的组合相互作用在转录起始中起着至关重要的作用,这是基因表达调控的一个重要控制点。我们在此提出了一种基于 DNA 融解自由能的算法,用于预测拟南芥(Arabidopsis thaliana)和水稻(Oryza sativa)植物基因组中顺式调控区的分析。对于编码蛋白质的基因,我们分别实现了 96%和 42%的召回率和 42%的精度,以及 97%和 31%的召回率和 31%的精度。对于非编码 RNA 基因,该程序分别给出了 94%和 75%的召回率和 90%的精度,以及 95%和 90%的精度。此外,96%的假阳性预测位于初级转录物的非编码区,其中 20%仅位于第一个内含子中,表明可能具有调节作用。来自两个基因组的同源基因的预测在预测分数和启动子组织方面具有很好的相关性。将我们的结果与现有的植物基因组启动子预测程序进行比较表明,我们的方法显示出了改进的预测能力。