School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.
Department of Computing, the Hong Kong Polytechnic University, Hong Kong.
Sci Rep. 2016 Jun 10;6:27653. doi: 10.1038/srep27653.
Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community.
蛋白质与 DNA 的相互作用涉及许多对细胞功能至关重要的基本生物学过程。现有的大多数计算方法仅使用目标残基的序列上下文进行预测。在本研究中,对于每个目标残基,我们同时应用空间上下文和序列上下文来构建特征空间。随后,应用潜在语义分析(LSA)来去除特征空间中的冗余信息。最后,通过集成支持向量机(SVM)分类器和集成学习,开发了一个预测器(PDNAsite)。在 PDNA-62 和 PDNA-224 数据集上的结果表明,从空间上下文提取的特征比从序列上下文提取的特征提供了更多的信息,而将它们结合起来则可以获得更好的性能提升。对目标位点空间上下文的结合位点数量的分析表明,相邻结合位点之间的相互作用对于蛋白质-DNA 识别及其结合能力非常重要。与现有方法相比,我们提出的 PDNAsite 方法优于大多数现有方法,是一种用于 DNA 结合位点识别的有用工具。我们的预测器的 Web 服务器(http://hlt.hitsz.edu.cn:8080/PDNAsite/)可供生物研究社区免费访问。