School of Information Engineering, East China Jiaotong University, Nanchang, China.
School of Information Science and Engineering, Shandong Normal University, Jinan, China.
BMC Genomics. 2024 Jan 18;25(1):73. doi: 10.1186/s12864-024-09998-2.
Long noncoding RNAs (lncRNAs) are integral to a plethora of critical cellular biological processes, including the regulation of gene expression, cell differentiation, and the development of tumors and cancers. Predicting the relationships between lncRNAs and diseases can contribute to a better understanding of the pathogenic mechanisms of disease and provide strong support for the development of advanced treatment methods.
Therefore, we present an innovative Node-Adaptive Graph Transformer model for predicting unknown LncRNA-Disease Associations, named NAGTLDA. First, we utilize the node-adaptive feature smoothing (NAFS) method to learn the local feature information of nodes and encode the structural information of the fusion similarity network of diseases and lncRNAs using Structural Deep Network Embedding (SDNE). Next, the Transformer module is used to capture potential association information between the network nodes. Finally, we employ a Transformer module with two multi-headed attention layers for learning global-level embedding fusion. Network structure coding is added as the structural inductive bias of the network to compensate for the missing message-passing mechanism in Transformer. NAGTLDA achieved an average AUC of 0.9531 and AUPR of 0.9537 significantly higher than state-of-the-art methods in 5-fold cross validation. We perform case studies on 4 diseases; 55 out of 60 associations between lncRNAs and diseases have been validated in the literatures. The results demonstrate the enormous potential of the graph Transformer structure to incorporate graph structural information for uncovering lncRNA-disease unknown correlations.
Our proposed NAGTLDA model can serve as a highly efficient computational method for predicting biological information associations.
长链非编码 RNA(lncRNA)是众多关键细胞生物学过程的组成部分,包括基因表达调控、细胞分化以及肿瘤和癌症的发展。预测 lncRNA 与疾病之间的关系有助于更好地理解疾病的发病机制,并为开发先进的治疗方法提供有力支持。
因此,我们提出了一种用于预测未知 lncRNA-疾病关联的创新的节点自适应图 Transformer 模型,命名为 NAGTLDA。首先,我们利用节点自适应特征平滑(NAFS)方法学习节点的局部特征信息,并使用结构深度网络嵌入(SDNE)对疾病和 lncRNA 的融合相似性网络的结构信息进行编码。接下来,Transformer 模块用于捕获网络节点之间的潜在关联信息。最后,我们使用具有两个多头注意力层的 Transformer 模块进行学习全局级别的嵌入融合。网络结构编码作为网络的结构归纳偏差添加,以补偿 Transformer 中缺失的消息传递机制。在 5 折交叉验证中,NAGTLDA 的平均 AUC 为 0.9531,AUPR 为 0.9537,明显高于最先进的方法。我们对 4 种疾病进行了案例研究;文献中已经验证了 lncRNA 和疾病之间的 60 个关联中的 55 个。结果表明,图 Transformer 结构结合图结构信息来揭示 lncRNA-疾病未知关联的潜力巨大。
我们提出的 NAGTLDA 模型可以作为一种高效的计算方法,用于预测生物信息关联。