Peng Zhenling, Li Zixia, Meng Qiaozhen, Zhao Bi, Kurgan Lukasz
Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China.
Frontier Science Center for Nonlinear Expectations, Ministry of Education, Qingdao, 266237, China.
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac502.
One of key features of intrinsically disordered regions (IDRs) is facilitation of protein-protein and protein-nucleic acids interactions. These disordered binding regions include molecular recognition features (MoRFs), short linear motifs (SLiMs) and longer binding domains. Vast majority of current predictors of disordered binding regions target MoRFs, with a handful of methods that predict SLiMs and disordered protein-binding domains. A new and broader class of disordered binding regions, linear interacting peptides (LIPs), was introduced recently and applied in the MobiDB resource. LIPs are segments in protein sequences that undergo disorder-to-order transition upon binding to a protein or a nucleic acid, and they cover MoRFs, SLiMs and disordered protein-binding domains. Although current predictors of MoRFs and disordered protein-binding regions could be used to identify some LIPs, there are no dedicated sequence-based predictors of LIPs. To this end, we introduce CLIP, a new predictor of LIPs that utilizes robust logistic regression model to combine three complementary types of inputs: co-evolutionary information derived from multiple sequence alignments, physicochemical profiles and disorder predictions. Ablation analysis suggests that the co-evolutionary information is particularly useful for this prediction and that combining the three inputs provides substantial improvements when compared to using these inputs individually. Comparative empirical assessments using low-similarity test datasets reveal that CLIP secures area under receiver operating characteristic curve (AUC) of 0.8 and substantially improves over the results produced by the closest current tools that predict MoRFs and disordered protein-binding regions. The webserver of CLIP is freely available at http://biomine.cs.vcu.edu/servers/CLIP/ and the standalone code can be downloaded from http://yanglab.qd.sdu.edu.cn/download/CLIP/.
内在无序区域(IDR)的关键特征之一是促进蛋白质-蛋白质和蛋白质-核酸相互作用。这些无序结合区域包括分子识别特征(MoRF)、短线性基序(SLiM)和较长的结合结构域。当前绝大多数无序结合区域预测工具都针对MoRF,只有少数方法可预测SLiM和无序蛋白质结合结构域。最近引入了一类新的、更广泛的无序结合区域,即线性相互作用肽(LIP),并应用于MobiDB资源中。LIP是蛋白质序列中的片段,在与蛋白质或核酸结合时会经历从无序到有序的转变,它们涵盖了MoRF、SLiM和无序蛋白质结合结构域。尽管当前的MoRF和无序蛋白质结合区域预测工具可用于识别一些LIP,但尚无专门基于序列的LIP预测工具。为此,我们引入了CLIP,这是一种新的LIP预测工具,它利用强大的逻辑回归模型来组合三种互补类型的输入:从多序列比对中获得的共进化信息、物理化学特征和无序预测。消融分析表明,共进化信息对该预测特别有用,并且与单独使用这些输入相比,组合这三种输入可带来显著改进。使用低相似性测试数据集进行的比较实证评估表明,CLIP的受试者工作特征曲线下面积(AUC)为0.8,与当前预测MoRF和无序蛋白质结合区域的最接近工具所产生的结果相比有显著改进。CLIP的网络服务器可在http://biomine.cs.vcu.edu/servers/CLIP/免费获取,独立代码可从http://yanglab.qd.sdu.edu.cn/download/CLIP/下载。