School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.
Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China.
Bioinformatics. 2022 Feb 7;38(5):1252-1260. doi: 10.1093/bioinformatics/btab810.
Intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. The IDRs are divided into long disordered regions (LDRs) and short disordered regions (SDRs) according to their lengths. Previous studies have shown that LDRs and SDRs have different proprieties. However, the existing computational methods fail to extract different features for LDRs and SDRs separately. As a result, they achieve unstable performance on datasets with different ratios of LDRs and SDRs.
In this study, a two-layer predictor was proposed called DeepIDP-2L. In the first layer, two kinds of attention-based models are used to extract different features for LDRs and SDRs, respectively. The hierarchical attention network is used to capture the distribution pattern features of LDRs, and convolutional attention network is used to capture the local correlation features of SDRs. The second layer of DeepIDP-2L maps the feature extracted in the first layer into a new feature space. Convolutional network and bidirectional long short term memory are used to capture the local and long-range information for predicting both SDRs and LDRs. Experimental results show that DeepIDP-2L can achieve more stable performance than other exiting predictors on independent test sets with different ratios of SDRs and LDRs.
For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the new predictor has been established at http://bliulab.net/DeepIDP-2L/. It is anticipated that DeepIDP-2L will become a very useful tool for identification of intrinsically disordered regions.
Supplementary data are available at Bioinformatics online.
无序区域(IDR)广泛存在于蛋白质中。准确预测 IDR 对于蛋白质结构和功能分析至关重要。根据长度,IDR 分为长无序区域(LDR)和短无序区域(SDR)。先前的研究表明,LDR 和 SDR 具有不同的特性。然而,现有的计算方法无法分别提取 LDR 和 SDR 的不同特征。因此,它们在 LDR 和 SDR 比例不同的数据集上表现不稳定。
在这项研究中,提出了一种称为 DeepIDP-2L 的两层预测器。在第一层中,使用两种基于注意力的模型分别提取 LDR 和 SDR 的不同特征。使用层次注意力网络捕获 LDR 的分布模式特征,使用卷积注意力网络捕获 SDR 的局部相关特征。DeepIDP-2L 的第二层将第一层提取的特征映射到新的特征空间。卷积网络和双向长短期记忆用于捕获局部和远程信息,以预测 SDR 和 LDR。实验结果表明,与其他现有预测器相比,DeepIDP-2L 在 SDR 和 LDR 比例不同的独立测试集上可以实现更稳定的性能。
为了方便大多数实验科学家,我们在 http://bliulab.net/DeepIDP-2L/ 上建立了一个用户友好且可公开访问的新预测器的网络服务器。预计 DeepIDP-2L 将成为识别内在无序区域的非常有用的工具。
补充数据可在生物信息学在线获得。