Guan Zhihao, Jin Xiu, Zhang Xiaodan
College of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China.
Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China.
J Chem Inf Model. 2025 Apr 14;65(7):3324-3342. doi: 10.1021/acs.jcim.5c00174. Epub 2025 Mar 24.
Noncoding RNAs(ncRNAs), including piwi-interacting RNA(piRNA), long noncoding RNA(lncRNA), microRNA(miRNA), small nucleolar RNA(snoRNA), and circular RNA(circRNA), contribute significantly to gene expression regulation and serve as key factors in disease association studies and health-related exploration. Accurate prediction of ncRNA-disease associations is crucial for elucidating disease mechanisms and advancing therapeutic development. Recently, computational models based on a graph neural network have extensively emerged for identifying associations among various ncRNAs and diseases. However, existing computational models have not fully utilized integrative information on ncRNs and diseases, and reliance on GNN-based models alone may be limited in performance due to oversmoothing issues. On the other hand, existing models are mainly targeted at a specific type of ncRNA and may not be applicable to most ncRNAs. Therefore, to overcome these limitations, we propound a computational model MFF-nDA based on multimodule fusion. Specifically, we first introduce five types of similarity network information, including three types of ncRNA and two types of disease similarity information, in order to fully explore and optimize the multisource feature information on these entities. Subsequently, we establish three modules: heterogeneous network representation module based on Transformer, association network representation module based on graph convolutional network (GCN), and topological structure representation module based on graph attention network (GAT), which capture diverse features of nodes in heterogeneous networks and topological structure information reflected in association networks. The complementary effects of the three modules also help relieve the oversmoothing issue to some extent. By leveraging the multimodule fusion learning to comprehensively capture the diverse features of these entities, our model outperforms the available state-of-the-art methods, achieving an AUC greater than 0.9000 for each dataset. This demonstrates the highest predictive performance, making it a valuable tool for identifying potential ncRNA associated with diseases. The code of MFF-nDA can be accessed at https://github.com/Jack-Cxy/MFF-nDA.
非编码RNA(ncRNAs),包括与Piwi相互作用的RNA(piRNA)、长链非编码RNA(lncRNA)、微小RNA(miRNA)、小核仁RNA(snoRNA)和环状RNA(circRNA),在基因表达调控中发挥着重要作用,并且是疾病关联研究和健康相关探索的关键因素。准确预测ncRNA与疾病的关联对于阐明疾病机制和推动治疗发展至关重要。最近,基于图神经网络的计算模型大量涌现,用于识别各种ncRNA与疾病之间的关联。然而,现有的计算模型尚未充分利用ncRNAs和疾病的综合信息,并且由于过平滑问题,仅依赖基于GNN的模型可能在性能上受到限制。另一方面,现有模型主要针对特定类型的ncRNA,可能不适用于大多数ncRNAs。因此,为了克服这些限制,我们提出了一种基于多模块融合的计算模型MFF-nDA。具体来说,我们首先引入了五种类型的相似性网络信息,包括三种类型的ncRNA相似性信息和两种类型的疾病相似性信息,以便充分探索和优化这些实体上的多源特征信息。随后,我们建立了三个模块:基于Transformer的异质网络表示模块、基于图卷积网络(GCN)的关联网络表示模块和基于图注意力网络(GAT)的拓扑结构表示模块,它们捕获异质网络中节点的不同特征以及关联网络中反映的拓扑结构信息。这三个模块的互补作用也有助于在一定程度上缓解过平滑问题。通过利用多模块融合学习来全面捕获这些实体的不同特征,我们的模型优于现有的最先进方法,在每个数据集上的AUC均大于0.9000。这证明了其最高的预测性能,使其成为识别与疾病相关的潜在ncRNA的有价值工具。MFF-nDA的代码可在https://github.com/Jack-Cxy/MFF-nDA获取。