植物RNA聚合酶II启动子的计算分析

Computational analysis of plant RNA Pol-II promoters.

作者信息

Pandey S P, Krishnamachari A

机构信息

Bioinformatics Centre, School of Information Technology, Jawaharlal Nehru University, Hall No. 6, Lecture Halls Complex, New Delhi 110067, India.

出版信息

Biosystems. 2006 Jan;83(1):38-50. doi: 10.1016/j.biosystems.2005.09.001. Epub 2005 Oct 19.

DOI:10.1016/j.biosystems.2005.09.001

PMID:16236422

Abstract

Plant promoters have not yet been thoroughly analyzed in terms of their structural and sequence dependent properties like curvature, periodicity and information content and our present study is an attempt in that direction. Results were compared with E. coli and yeast data to get some insight into the promoter organization. Promoters having the TATA box (TATA(+)) and those lacking the same (TATA(-)) were also analyzed separately. It was found that plant promoters have marked differences for all these properties when compared to E. coli and yeast. Bias for A+T was observed in promoters of all the three groups. Compared to E. coli and yeast, plant promoters showed intermediate values for A+T content as well as curvature. Analysis showed that curvature of core promoters is more pronounced than non-promoters. Information theoretic analysis of plant promoters reveal high information content at certain consensus regions such as -30 (TATA box) and +1 transcription start site (TSS); and have moderate values at other positions as well. This factor was taken into account while developing weight matrices. For certain threshold values, these weight matrices could pick up all true positives, and reduce false positives to a great extent in a test set. A new multi-parameterized prediction strategy has been proposed that uses a combination of sequence composition, curvature and position weight matrices for identification of plant promoters. This strategy was tested and validated with experimentally known promoter sequences. Our study is novel in using in silico approaches to study the sequence dependent properties of plant RNA Pol-II promoters and their prediction, and important as there is no dedicated promoter search tool for plants.

摘要

就植物启动子的结构和序列依赖性特性（如曲率、周期性和信息含量）而言，尚未进行全面分析，而我们目前的研究正是朝着这个方向进行的一次尝试。将结果与大肠杆菌和酵母的数据进行比较，以深入了解启动子的组织情况。还分别分析了具有TATA框的启动子（TATA(+)）和缺乏TATA框的启动子（TATA(-)）。结果发现，与大肠杆菌和酵母相比，植物启动子在所有这些特性上都有显著差异。在所有三组启动子中均观察到对A+T的偏好。与大肠杆菌和酵母相比，植物启动子的A+T含量以及曲率呈现中间值。分析表明，核心启动子的曲率比非启动子更为明显。对植物启动子的信息论分析显示，在某些共有区域（如-30（TATA框）和+1转录起始位点（TSS））具有高信息含量；在其他位置也具有中等值。在开发权重矩阵时考虑了这一因素。对于某些阈值，这些权重矩阵可以在测试集中识别出所有真阳性，并在很大程度上减少假阳性。提出了一种新的多参数预测策略，该策略使用序列组成、曲率和位置权重矩阵的组合来识别植物启动子。该策略通过实验已知的启动子序列进行了测试和验证。我们的研究在使用计算机方法研究植物RNA聚合酶II启动子的序列依赖性特性及其预测方面具有创新性，并且很重要，因为目前还没有专门用于植物的启动子搜索工具。