Meng Fanchi, Kurgan Lukasz
Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 2V4, Canada.
Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 2V4, Canada Department of Computer Science, Virginia Commonwealth University, Richmond, 23284, U.S.A.
Bioinformatics. 2016 Jun 15;32(12):i341-i350. doi: 10.1093/bioinformatics/btw280.
Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentally annotated DFLs provides an opportunity to build high-throughput computational predictors of these regions from protein sequences. To date, there are no computational methods that directly predict DFLs and they can be found only indirectly by filtering predicted flexible residues with predictions of disorder.
We conceptualized, developed and empirically assessed a first-of-its-kind sequence-based predictor of DFLs, DFLpred. This method outputs propensity to form DFLs for each residue in the input sequence. DFLpred uses a small set of empirically selected features that quantify propensities to form certain secondary structures, disordered regions and structured regions, which are processed by a fast linear model. Our high-throughput predictor can be used on the whole-proteome scale; it needs <1 h to predict entire proteome on a single CPU. When assessed on an independent test dataset with low sequence-identity proteins, it secures area under the receiver operating characteristic curve equal 0.715 and outperforms existing alternatives that include methods for the prediction of flexible linkers, flexible residues, intrinsically disordered residues and various combinations of these methods. Prediction on the complete human proteome reveals that about 10% of proteins have a large content of over 30% DFL residues. We also estimate that about 6000 DFL regions are long with ≥30 consecutive residues.
http://biomine.ece.ualberta.ca/DFLpred/
Supplementary data are available at Bioinformatics online.
无序柔性连接子(DFLs)是在多结构域蛋白质中或结构域内的结构化成分之间充当柔性连接子/间隔区的无序区域。它们与柔性连接子/残基不同,因为它们是无序的且更长。实验注释的DFLs的可用性为从蛋白质序列构建这些区域的高通量计算预测器提供了机会。迄今为止,尚无直接预测DFLs的计算方法,只能通过用无序预测过滤预测的柔性残基来间接找到它们。
我们构思、开发并实证评估了首个基于序列的DFLs预测器DFLpred。该方法输出输入序列中每个残基形成DFLs的倾向。DFLpred使用一小套经验选择的特征来量化形成某些二级结构、无序区域和结构化区域的倾向,这些特征由一个快速线性模型处理。我们的高通量预测器可用于全蛋白质组规模;在单个CPU上预测整个蛋白质组需要不到1小时。在具有低序列同一性蛋白质的独立测试数据集上进行评估时,它在接收器操作特征曲线下的面积为0.715,优于现有的替代方法,包括用于预测柔性连接子、柔性残基、内在无序残基以及这些方法的各种组合的方法。对完整人类蛋白质组的预测表明,约10%的蛋白质含有超过30%的DFL残基。我们还估计约有6000个DFL区域长度≥30个连续残基。
http://biomine.ece.ualberta.ca/DFLpred/
补充数据可在《生物信息学》在线获取。