Suppr超能文献

IDP⁻CRF:基于条件随机场的无序蛋白/区域识别。

IDP⁻CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields.

机构信息

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.

出版信息

Int J Mol Sci. 2018 Aug 22;19(9):2483. doi: 10.3390/ijms19092483.

Abstract

Accurate prediction of intrinsically disordered proteins/regions is one of the most important tasks in bioinformatics, and some computational predictors have been proposed to solve this problem. How to efficiently incorporate the sequence-order effect is critical for constructing an accurate predictor because disordered region distributions show global sequence patterns. In order to capture these sequence patterns, several sequence labelling models have been applied to this field, such as conditional random fields (CRFs). However, these methods suffer from certain disadvantages. In this study, we proposed a new computational predictor called IDP⁻CRF, which is trained on an updated benchmark dataset based on the MobiDB database and the DisProt database, and incorporates more comprehensive sequence-based features, including PSSMs (position-specific scoring matrices), kmer, predicted secondary structures, and relative solvent accessibilities. Experimental results on the benchmark dataset and two independent datasets show that IDP⁻CRF outperforms 25 existing state-of-the-art methods in this field, demonstrating that IDP⁻CRF is a very useful tool for identifying IDPs/IDRs (intrinsically disordered proteins/regions). We anticipate that IDP⁻CRF will facilitate the development of protein sequence analysis.

摘要

准确预测无规卷曲蛋白质/区域是生物信息学中最重要的任务之一,已经提出了一些计算预测器来解决这个问题。如何有效地整合序列顺序效应对于构建准确的预测器至关重要,因为无规区域分布显示全局序列模式。为了捕捉这些序列模式,已经将几种序列标记模型应用于该领域,例如条件随机场 (CRFs)。然而,这些方法存在某些缺点。在这项研究中,我们提出了一种名为 IDP⁻CRF 的新计算预测器,它是在基于 MobiDB 数据库和 DisProt 数据库的更新基准数据集上进行训练的,并结合了更全面的基于序列的特征,包括 PSSMs(位置特异性评分矩阵)、kmer、预测的二级结构和相对溶剂可及性。在基准数据集和两个独立数据集上的实验结果表明,IDP⁻CRF 在该领域的 25 种现有最先进方法中表现出色,表明 IDP⁻CRF 是识别 IDPs/IDRs(无规卷曲蛋白质/区域)的非常有用的工具。我们预计 IDP⁻CRF 将促进蛋白质序列分析的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/6164615/f86cf83bc666/ijms-19-02483-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验