Yu Haihao, Yu Yue, Xia Yanling
Computer Science and Technology College, Heilongjiang Institute of Technology, No. 999 Hongqi Street, Harbin 150009, China.
College of Animal Science, Jilin University, No. 1977 Xinzhu Road, Changchun 130012, China.
Genes (Basel). 2025 Mar 31;16(4):413. doi: 10.3390/genes16040413.
Circular RNA is a type of noncoding RNA with a special covalent bond structure. As an endogenous RNA in animals and plants, it is formed through RNA splicing. The 5' and 3' ends of the exons form circular RNA at the back-splicing sites. Circular RNA plays an important regulatory role in diseases by interacting with the associated miRNAs. Accurate identification of circular RNA can enrich the data on circular RNA and provide new ideas for drug development. At present, mainstream circular RNA recognition algorithms are divided into two categories: those based on RNA sequence position information and those based on RNA sequence biometric information. Herein, we propose a method for the recognition of circular RNA, called circ2LO, which utilizes the LucaOne large model for feature embedding of the splicing sites of RNA sequences as well as their upstream and downstream sequences to prevent semantic information loss caused by the traditional one-hot encoding method. Subsequently, it employs a convolutional layer to extract features and a self-attention mechanism to extract interactive features to accurately capture the core features of the circular RNA at the splicing sites. Finally, it uses a fully connected layer to identify circular RNA. The accuracy of circ2LO on the human dataset reached 95.47%, which is higher than the values shown by existing methods. It also achieved accuracies of 97.04% and 72.04% on the Arabidopsis and mouse datasets, respectively, demonstrating good robustness. Through rigorous validation, the circ2LO model has proven its high-precision identification capability for circular RNAs, marking it as a potentially transformative analytical platform in the circRNA research field.
环状RNA是一种具有特殊共价键结构的非编码RNA。作为动植物中的内源性RNA,它通过RNA剪接形成。外显子的5'端和3'端在反向剪接位点形成环状RNA。环状RNA通过与相关的微小RNA相互作用,在疾病中发挥重要的调节作用。准确识别环状RNA可以丰富环状RNA的数据,并为药物开发提供新思路。目前,主流的环状RNA识别算法分为两类:基于RNA序列位置信息的算法和基于RNA序列生物特征信息的算法。在此,我们提出了一种环状RNA识别方法,称为circ2LO,该方法利用LucaOne大模型对RNA序列及其上游和下游序列的剪接位点进行特征嵌入,以防止传统独热编码方法导致的语义信息丢失。随后,它采用卷积层提取特征,并采用自注意力机制提取交互特征,以准确捕捉剪接位点处环状RNA的核心特征。最后,它使用全连接层来识别环状RNA。circ2LO在人类数据集上的准确率达到了95.47%,高于现有方法的值。它在拟南芥和小鼠数据集上的准确率分别达到了97.04%和72.04%,显示出良好的稳健性。通过严格验证,circ2LO模型已证明其对环状RNA的高精度识别能力,标志着它是环状RNA研究领域中一个潜在的变革性分析平台。