Suppr超能文献

iDNA-Prot|dis:通过将氨基酸距离对和简化字母表概况纳入通用伪氨基酸组成来鉴定DNA结合蛋白。

iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.

作者信息

Liu Bin, Xu Jinghao, Lan Xun, Xu Ruifeng, Zhou Jiyun, Wang Xiaolong, Chou Kuo-Chen

机构信息

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China; Gordon Life Science Institute, Belmont, Massachusetts, United States of America.

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.

出版信息

PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.

Abstract

Playing crucial roles in various cellular processes, such as recognition of specific nucleotide sequences, regulation of transcription, and regulation of gene expression, DNA-binding proteins are essential ingredients for both eukaryotic and prokaryotic proteomes. With the avalanche of protein sequences generated in the postgenomic age, it is a critical challenge to develop automated methods for accurate and rapidly identifying DNA-binding proteins based on their sequence information alone. Here, a novel predictor, called "iDNA-Prot|dis", was established by incorporating the amino acid distance-pair coupling information and the amino acid reduced alphabet profile into the general pseudo amino acid composition (PseAAC) vector. The former can capture the characteristics of DNA-binding proteins so as to enhance its prediction quality, while the latter can reduce the dimension of PseAAC vector so as to speed up its prediction process. It was observed by the rigorous jackknife and independent dataset tests that the new predictor outperformed the existing predictors for the same purpose. As a user-friendly web-server, iDNA-Prot|dis is accessible to the public at http://bioinformatics.hitsz.edu.cn/iDNA-Prot_dis/. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step protocol guide is provided on how to use the web-server to get their desired results without the need to follow the complicated mathematic equations that are presented in this paper just for the integrity of its developing process. It is anticipated that the iDNA-Prot|dis predictor may become a useful high throughput tool for large-scale analysis of DNA-binding proteins, or at the very least, play a complementary role to the existing predictors in this regard.

摘要

DNA结合蛋白在各种细胞过程中发挥着关键作用,如识别特定核苷酸序列、转录调控和基因表达调控,是真核生物和原核生物蛋白质组的重要组成部分。在后基因组时代,随着蛋白质序列的大量涌现,仅基于序列信息开发准确、快速识别DNA结合蛋白的自动化方法是一项严峻挑战。在此,通过将氨基酸距离对耦合信息和氨基酸简约字母特征纳入通用伪氨基酸组成(PseAAC)向量,建立了一种名为“iDNA-Prot|dis”的新型预测器。前者可捕捉DNA结合蛋白的特征以提高其预测质量,而后者可降低PseAAC向量的维度以加速其预测过程。通过严格的留一法和独立数据集测试观察到,新预测器在相同目的上优于现有预测器。作为一个用户友好的网络服务器,可通过http://bioinformatics.hitsz.edu.cn/iDNA-Prot_dis/ 向公众开放使用iDNA-Prot|dis。此外,为方便绝大多数实验科学家,提供了一份逐步操作指南,介绍如何使用网络服务器获得所需结果,而无需遵循本文中仅为其开发过程完整性而呈现的复杂数学方程。预计iDNA-Prot|dis预测器可能成为大规模分析DNA结合蛋白的有用高通量工具,或者至少在这方面对现有预测器起到补充作用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验