Liu Yumeng, Chen Shengyu, Wang Xiaolong, Liu Bin
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China.
School of Informatics, Computing and Engineering, Indiana University Bloomington, Bloomington, IN 47408, USA.
Mol Ther Nucleic Acids. 2019 Sep 6;17:396-404. doi: 10.1016/j.omtn.2019.06.004. Epub 2019 Jun 15.
Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for predicting different types of IDRs. Compared with these classification-based predictors, the previously proposed predictor IDP-CRF exhibits state-of-the-art performance for predicting IDPs/IDRs, which is a sequence labeling model based on conditional random fields (CRFs). Motivated by these methods, we propose a predictor called IDP-FSP, which is an ensemble of three CRF-based predictors called IDP-FSP-L, IDP-FSP-S, and IDP-FSP-G. These three predictors are specially designed to predict long, short, and generic disordered regions, respectively, and they are constructed based on different features. To the best of our knowledge, IDP-FSP is the first predictor that combines a sequence labeling algorithm with IDRs of different lengths. Experimental results using two independent test datasets show that IDP-FSP achieves better or at least comparable predictive performance with 26 existing state-of-the-art methods in this field, proving the effectiveness of IDP-FSP.
准确识别内在无序蛋白质/区域(IDPs/IDRs)对于预测蛋白质结构和功能至关重要。先前的研究表明,不同长度的IDRs具有不同的特征,并且已经提出了几种基于分类的预测器来预测不同类型的IDRs。与这些基于分类的预测器相比,先前提出的预测器IDP-CRF在预测IDPs/IDRs方面表现出了最先进的性能,它是一种基于条件随机场(CRFs)的序列标记模型。受这些方法的启发,我们提出了一种名为IDP-FSP的预测器,它是由三个基于CRF的预测器IDP-FSP-L、IDP-FSP-S和IDP-FSP-G组成的集成模型。这三个预测器分别专门设计用于预测长、短和一般无序区域,并且它们基于不同的特征构建。据我们所知,IDP-FSP是第一个将序列标记算法与不同长度的IDRs相结合的预测器。使用两个独立测试数据集的实验结果表明,IDP-FSP与该领域现有的26种最先进方法相比,实现了更好或至少相当的预测性能,证明了IDP-FSP的有效性。