IEEE/ACM Trans Comput Biol Bioinform. 2021 Jul-Aug;18(4):1451-1463. doi: 10.1109/TCBB.2019.2952338. Epub 2021 Aug 6.
DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two kinds of crucial proteins, which are associated with various cellule activities and some important diseases. Accurate identification of DBPs and RBPs facilitate both theoretical research and real world application. Existing sequence-based DBP predictors can accurately identify DBPs but incorrectly predict many RBPs as DBPs, and vice versa, resulting in low prediction precision. Moreover, some proteins (DRBPs) interacting with both DNA and RNA play important roles in gene expression and cannot be identified by existing computational methods. In this study, a two-level predictor named DeepDRBP-2L was proposed by combining Convolutional Neural Network (CNN) and the Long Short-Term Memory (LSTM). It is the first computational method that is able to identify DBPs, RBPs and DRBPs. Rigorous cross-validations and independent tests showed that DeepDRBP-2L is able to overcome the shortcoming of the existing methods and can go one further step to identify DRBPs. Application of DeepDRBP-2L to tomato genome further demonstrated its performance. The webserver of DeepDRBP-2L is freely available at http://bliulab.net/DeepDRBP-2L.
DNA 结合蛋白(DBP)和 RNA 结合蛋白(RBP)是两种关键蛋白,与多种细胞活动和一些重要疾病有关。准确识别 DBP 和 RBP 有助于理论研究和实际应用。现有的基于序列的 DBP 预测器可以准确识别 DBP,但会错误地将许多 RBP 预测为 DBP,反之亦然,导致预测精度低。此外,一些与 DNA 和 RNA 都相互作用的蛋白质(DRBP)在基因表达中起着重要作用,不能用现有的计算方法识别。在这项研究中,通过结合卷积神经网络(CNN)和长短期记忆(LSTM),提出了一种名为 DeepDRBP-2L 的两级预测器。它是第一个能够识别 DBP、RBP 和 DRBP 的计算方法。严格的交叉验证和独立测试表明,DeepDRBP-2L 能够克服现有方法的缺点,并能够更进一步识别 DRBP。将 DeepDRBP-2L 应用于番茄基因组进一步证明了其性能。DeepDRBP-2L 的网络服务器可在 http://bliulab.net/DeepDRBP-2L 上免费获得。