通过 top-n-gram 方法将进化信息纳入伪氨基酸组成，从而鉴定 DNA 结合蛋白。

Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach.

机构信息

a School of Computer Science and Technology , Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town , Xili, Shenzhen 518055 , Guangdong , China.

出版信息

J Biomol Struct Dyn. 2015;33(8):1720-30. doi: 10.1080/07391102.2014.968624. Epub 2014 Oct 28.

DOI:10.1080/07391102.2014.968624

PMID:25252709

Abstract

DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high-throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent data-set test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well.

摘要

DNA 结合蛋白对于各种细胞过程至关重要，因此已成为基础研究和药物开发的重要目标。在后基因组时代，随着蛋白质序列的大量涌现，人们非常希望能够建立一种基于序列信息的自动化方法，以便快速准确地识别 DNA 结合蛋白。由于所有生物物种都是从非常有限的祖先进化而来的，因此在开发这种高通量工具时，考虑进化信息非常重要。有鉴于此，通过 top-n-gram 方法将进化信息纳入伪氨基酸组成的通用形式，提出了一种新的预测器。通过 Jackknife 测试和独立数据集测试将新的预测器与现有方法进行比较，观察到新的预测器优于其对应物。预计新的预测器可能成为识别 DNA 结合蛋白的有用工具。我们注意到，将进化信息提取到统计样本公式中的新方法也可以用于识别许多其他蛋白质属性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过 top-n-gram 方法将进化信息纳入伪氨基酸组成，从而鉴定 DNA 结合蛋白。

Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach.

机构信息

出版信息

相似文献

引用本文的文献

通过 top-n-gram 方法将进化信息纳入伪氨基酸组成，从而鉴定 DNA 结合蛋白。

Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach.

机构信息

出版信息

相似文献

引用本文的文献