Suppr超能文献

基于位置特异得分矩阵和位置特异频率矩阵交叉变换的核酸结合蛋白识别方法(IDRBP-PPCT)

IDRBP-PPCT: Identifying Nucleic Acid-Binding Proteins Based on Position-Specific Score Matrix and Position-Specific Frequency Matrix Cross Transformation.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2284-2293. doi: 10.1109/TCBB.2021.3069263. Epub 2022 Aug 8.

Abstract

DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two important nucleic acid-binding proteins (NABPs), which play important roles in biological processes such as replication, translation and transcription of genetic material. Some proteins (DRBPs) bind to both DNA and RNA, also play a key role in gene expression. Identification of DBPs, RBPs and DRBPs is important to study protein-nucleic acid interactions. Computational methods are increasingly being proposed to automatically identify DNA- or RNA-binding proteins based only on protein sequences. One challenge is to design an effective protein representation method to convert protein sequences into fixed-dimension feature vectors. In this study, we proposed a novel protein representation method called Position-Specific Scoring Matrix (PSSM) and Position-Specific Frequency Matrix (PSFM) Cross Transformation (PPCT) to represent protein sequences. This method contains the evolutionary information in PSSM and PSFM, and their correlations. A new computational predictor called IDRBP-PPCT was proposed by combining PPCT and the two-layer framework based on the random forest algorithm to identify DBPs, RBPs and DRBPs. The experimental results on the independent dataset and the tomato genome proved the effectiveness of the proposed method. A user-friendly web-server of IDRBP-PPCT was constructed, which is freely available at http://bliulab.net/IDRBP-PPCT.

摘要

DNA 结合蛋白 (DBPs) 和 RNA 结合蛋白 (RBPs) 是两种重要的核酸结合蛋白 (NABPs),它们在遗传物质的复制、翻译和转录等生物过程中发挥着重要作用。一些蛋白质 (DRBPs) 既结合 DNA 又结合 RNA,也在基因表达中起着关键作用。鉴定 DBPs、RBPs 和 DRBPs 对于研究蛋白质-核酸相互作用很重要。越来越多的计算方法被提出,仅基于蛋白质序列自动识别 DNA 或 RNA 结合蛋白。其中一个挑战是设计一种有效的蛋白质表示方法,将蛋白质序列转换为固定维的特征向量。在这项研究中,我们提出了一种新的蛋白质表示方法,称为位置特异性评分矩阵 (PSSM) 和位置特异性频率矩阵 (PSFM) 交叉变换 (PPCT),用于表示蛋白质序列。该方法包含了 PSSM 和 PSFM 中的进化信息及其相关性。通过结合 PPCT 和基于随机森林算法的两层框架,提出了一种新的计算预测器 IDRBP-PPCT,用于识别 DBPs、RBPs 和 DRBPs。在独立数据集和番茄基因组上的实验结果证明了该方法的有效性。我们构建了一个用户友好的 IDRBP-PPCT 网络服务器,可在 http://bliulab.net/IDRBP-PPCT 上免费获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验