College of Software, Dalian Jiaotong University,794 Huanghe Road, Dalian 116028, China.
College of Science, Dalian Jiaotong University, 794 Huanghe Road, Dalian 116028, China.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae627.
Noncoding RNA refers to RNA that does not encode proteins. The lncRNA and miRNA it contains play crucial regulatory roles in organisms, and their aberrant expression is closely related to various diseases. Traditional experimental methods for validating the interactions of these RNAs have limitations, and existing prediction models exhibit relatively limited functionality, relying on isolated feature extraction and performing poorly in handling various types of small sample tasks. This paper proposes an improved de Bruijn graph that can inject RNA structural information into the graph while preserving sequence information. Furthermore, the improved de Bruijn graph enables graph neural networks to learn broader dependencies and correlations among data by introducing richer edge relationships. Meanwhile, the multitask learning model, DVMnet, proposed in this paper can handle multiple related tasks, and we optimize model parameters by integrating the total loss of three tasks. This enables multitask prediction of RNA interactions, disease associations, and subcellular localization. Compared with the best existing models in this field, DVMnet has achieved the best performance with a 3% improvement in the area under the curve value and demonstrates robust results in predicting diseases and subcellular localization. The improved de Bruijn graph is also applicable to various scenarios and can unify the sequence and structural information of various nucleic acids into a single graph.
非编码 RNA 是指不编码蛋白质的 RNA。其中包含的 lncRNA 和 miRNA 在生物体中发挥着至关重要的调控作用,它们的异常表达与各种疾病密切相关。传统的实验方法验证这些 RNA 的相互作用具有局限性,而现有的预测模型功能相对有限,依赖于孤立的特征提取,在处理各种类型的小样本任务时表现不佳。本文提出了一种改进的 de Bruijn 图,该图可以在保留序列信息的同时将 RNA 结构信息注入到图中。此外,改进的 de Bruijn 图通过引入更丰富的边关系,使图神经网络能够学习到数据之间更广泛的依赖关系和相关性。同时,本文提出的多任务学习模型 DVMnet 可以处理多个相关任务,我们通过整合三个任务的总损失来优化模型参数。这使得 RNA 相互作用、疾病关联和亚细胞定位的多任务预测成为可能。与该领域中最好的现有模型相比,DVMnet 在曲线下面积值方面的性能提高了 3%,在预测疾病和亚细胞定位方面的结果稳健。改进的 de Bruijn 图也适用于各种场景,可以将各种核酸的序列和结构信息统一到单个图中。