Shen Zhen, Liu Wei, Zhao ShuJun, Zhang QinHu, Wang SiGuo, Yuan Lin
School of Computer and Software, Nanyang Institute of Technology, Nanyang, Henan, China.
EIT Institute for Advanced Study, Ningbo, Zhejiang, China.
Front Genet. 2023 Oct 6;14:1283404. doi: 10.3389/fgene.2023.1283404. eCollection 2023.
CircRNA-protein binding plays a critical role in complex biological activity and disease. Various deep learning-based algorithms have been proposed to identify CircRNA-protein binding sites. These methods predict whether the CircRNA sequence includes protein binding sites from the sequence level, and primarily concentrate on analysing the sequence specificity of CircRNA-protein binding. For model performance, these methods are unsatisfactory in accurately predicting motif sites that have special functions in gene expression. In this study, based on the deep learning models that implement pixel-level binary classification prediction in computer vision, we viewed the CircRNA-protein binding sites prediction as a nucleotide-level binary classification task, and use a fully convolutional neural networks to identify CircRNA-protein binding motif sites (CPBFCN). CPBFCN provides a new path to predict CircRNA motifs. Based on the MEME tool, the existing CircRNA-related and protein-related database, we analysed the motif functions discovered by CPBFCN. We also investigated the correlation between CircRNA sponge and motif distribution. Furthermore, by comparing the motif distribution with different input sequence lengths, we found that some motifs in the flanking sequences of CircRNA-protein binding region may contribute to CircRNA-protein binding. This study contributes to identify circRNA-protein binding and provides help in understanding the role of circRNA-protein binding in gene expression regulation.
环状RNA与蛋白质的结合在复杂的生物活性和疾病中起着关键作用。人们已经提出了各种基于深度学习的算法来识别环状RNA与蛋白质的结合位点。这些方法从序列水平预测环状RNA序列是否包含蛋白质结合位点,主要集中于分析环状RNA与蛋白质结合的序列特异性。就模型性能而言,这些方法在准确预测在基因表达中具有特殊功能的基序位点方面并不令人满意。在本研究中,基于在计算机视觉中实现像素级二元分类预测的深度学习模型,我们将环状RNA与蛋白质的结合位点预测视为核苷酸级二元分类任务,并使用全卷积神经网络来识别环状RNA与蛋白质的结合基序位点(CPBFCN)。CPBFCN为预测环状RNA基序提供了一条新途径。基于MEME工具、现有的环状RNA相关和蛋白质相关数据库,我们分析了CPBFCN发现的基序功能。我们还研究了环状RNA海绵与基序分布之间的相关性。此外,通过比较不同输入序列长度的基序分布,我们发现环状RNA与蛋白质结合区域侧翼序列中的一些基序可能有助于环状RNA与蛋白质的结合。本研究有助于识别环状RNA与蛋白质的结合,并为理解环状RNA与蛋白质的结合在基因表达调控中的作用提供帮助。