Suppr超能文献

基于位置特异得分矩阵和位置特异频率矩阵交叉变换的核酸结合蛋白识别方法(IDRBP-PPCT)

IDRBP-PPCT: Identifying Nucleic Acid-Binding Proteins Based on Position-Specific Score Matrix and Position-Specific Frequency Matrix Cross Transformation.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2284-2293. doi: 10.1109/TCBB.2021.3069263. Epub 2022 Aug 8.

Abstract

DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two important nucleic acid-binding proteins (NABPs), which play important roles in biological processes such as replication, translation and transcription of genetic material. Some proteins (DRBPs) bind to both DNA and RNA, also play a key role in gene expression. Identification of DBPs, RBPs and DRBPs is important to study protein-nucleic acid interactions. Computational methods are increasingly being proposed to automatically identify DNA- or RNA-binding proteins based only on protein sequences. One challenge is to design an effective protein representation method to convert protein sequences into fixed-dimension feature vectors. In this study, we proposed a novel protein representation method called Position-Specific Scoring Matrix (PSSM) and Position-Specific Frequency Matrix (PSFM) Cross Transformation (PPCT) to represent protein sequences. This method contains the evolutionary information in PSSM and PSFM, and their correlations. A new computational predictor called IDRBP-PPCT was proposed by combining PPCT and the two-layer framework based on the random forest algorithm to identify DBPs, RBPs and DRBPs. The experimental results on the independent dataset and the tomato genome proved the effectiveness of the proposed method. A user-friendly web-server of IDRBP-PPCT was constructed, which is freely available at http://bliulab.net/IDRBP-PPCT.

摘要

DNA 结合蛋白 (DBPs) 和 RNA 结合蛋白 (RBPs) 是两种重要的核酸结合蛋白 (NABPs),它们在遗传物质的复制、翻译和转录等生物过程中发挥着重要作用。一些蛋白质 (DRBPs) 既结合 DNA 又结合 RNA,也在基因表达中起着关键作用。鉴定 DBPs、RBPs 和 DRBPs 对于研究蛋白质-核酸相互作用很重要。越来越多的计算方法被提出,仅基于蛋白质序列自动识别 DNA 或 RNA 结合蛋白。其中一个挑战是设计一种有效的蛋白质表示方法,将蛋白质序列转换为固定维的特征向量。在这项研究中,我们提出了一种新的蛋白质表示方法,称为位置特异性评分矩阵 (PSSM) 和位置特异性频率矩阵 (PSFM) 交叉变换 (PPCT),用于表示蛋白质序列。该方法包含了 PSSM 和 PSFM 中的进化信息及其相关性。通过结合 PPCT 和基于随机森林算法的两层框架,提出了一种新的计算预测器 IDRBP-PPCT,用于识别 DBPs、RBPs 和 DRBPs。在独立数据集和番茄基因组上的实验结果证明了该方法的有效性。我们构建了一个用户友好的 IDRBP-PPCT 网络服务器,可在 http://bliulab.net/IDRBP-PPCT 上免费获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验