Cai Yudong, He Jianfeng, Li Xinlei, Lu Lin, Yang Xinyi, Feng Kaiyan, Lu Wencong, Kong Xiangyin
Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
J Proteome Res. 2009 Feb;8(2):999-1003. doi: 10.1021/pr800717y.
Transcription is one of the most important processes in cell in which transcription factors translate DNA sequences into RNA sequences. Accurate prediction of DNA binding preference of transcription factors is valuable for understanding the transcription regulatory mechanism and (1) elucidating regulation network. (2-4) Here we predict the DNA binding preference of transcription factor based on the protein amino acid composition and physicochemical properties, 0/1 encoding system of nucleotide, minimum Redundancy Maximum Relevance Feature Selection method, (5) and Nearest Neighbor Algorithm. The overall prediction accuracy of Jackknife cross-validation test is 91.1%, indicating that this approach is a useful tool to explore the relation between transcription factor and its binding sites. Moreover, we find that the secondary structure and polarizability of transcriptor contribute mostly in the prediction. Especially, a 7-nt motif with AT-rich region of the DNA binding sites discovered via our method is also consistent with the statistical analysis from the TRANSFAC database. (6).
转录是细胞中最重要的过程之一,在此过程中,转录因子将DNA序列转化为RNA序列。准确预测转录因子的DNA结合偏好对于理解转录调控机制和阐明调控网络具有重要价值。在此,我们基于蛋白质氨基酸组成和理化性质、核苷酸的0/1编码系统、最小冗余最大相关特征选择方法以及最近邻算法来预测转录因子的DNA结合偏好。留一法交叉验证测试的总体预测准确率为91.1%,表明该方法是探索转录因子与其结合位点之间关系的有用工具。此外,我们发现转录因子的二级结构和极化率在预测中贡献最大。特别是,通过我们的方法发现的DNA结合位点的一个富含AT区域的7核苷酸基序也与TRANSFAC数据库的统计分析结果一致。