IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):307-320. doi: 10.1109/TCBB.2022.3150280. Epub 2023 Feb 3.
The recognition of DNA- (DBPs) and RNA-binding proteins (RBPs) is not only conducive to understanding cell function, but also a challenging task. Previous studies have shown that these proteins are usually considered separately due to different binding domains. In addition, due to the high similarity between DBPs and RBPs, it is possible for DBPs predictor to predict RBPs as DBPs, and vice versa, which leads to high cross-prediction rate. In this study, we creatively propose a novel deep multi-label joint learning framework to leverage the relationship between multiple labels and binding proteins. First, a multi-label variant network is designed to explore multi-scale context hidden information. Then, multi-label Long Short-Term Memory (multiLSTM) is used to mine the potential relationship between labels. Finally, the calibrated hidden features from variant network are considered for different levels of joint learning so that multiLSTM can better explore the correlation between them. Extensive experiments are also carried out to compare the proposed method with other existing methods. Furthermore, we also provide further insights into the importance of the relevant bioanalysis of proteins obtained from our model and summarize these binding proteins that are significantly related to a disease. Our method is freely available at http://39.108.90.186/dmlj.
DNA-(DBPs)和 RNA 结合蛋白(RBPs)的识别不仅有助于了解细胞功能,而且是一项具有挑战性的任务。先前的研究表明,由于结合域不同,这些蛋白质通常被认为是分开的。此外,由于 DBPs 和 RBPs 之间具有高度相似性,因此 DBP 预测器有可能将 RBPs 预测为 DBPs,反之亦然,这导致了较高的交叉预测率。在这项研究中,我们创造性地提出了一种新颖的深度多标签联合学习框架,以利用多个标签和结合蛋白之间的关系。首先,设计了一个多标签变体网络来探索多尺度上下文隐藏信息。然后,使用多标签长短期记忆(multiLSTM)挖掘标签之间的潜在关系。最后,考虑来自变体网络的校准隐藏特征,以进行不同层次的联合学习,以便 multiLSTM 可以更好地探索它们之间的相关性。还进行了广泛的实验来比较所提出的方法与其他现有方法。此外,我们还进一步深入了解了从我们的模型获得的蛋白质的相关生物分析的重要性,并总结了与疾病显著相关的这些结合蛋白。我们的方法可在 http://39.108.90.186/dmlj 上免费获得。