Wang Shengchang, Qiao Jiaqing, Feng Shou
School of Electronic and Information Engineering, Harbin Institute of Technology, Harbin, 150001, China.
College of Information and Communication Engineering, Harbin Engineering University, Harbin, 150001, China.
Sci Rep. 2024 Mar 2;14(1):5185. doi: 10.1038/s41598-024-55957-y.
LncRNAs are non-coding RNAs with a length of more than 200 nucleotides. More and more evidence shows that lncRNAs are inextricably linked with diseases. To make up for the shortcomings of traditional methods, researchers began to collect relevant biological data in the database and used bioinformatics prediction tools to predict the associations between lncRNAs and diseases, which greatly improved the efficiency of the study. To improve the prediction accuracy of current methods, we propose a new lncRNA-disease associations prediction method with attention mechanism, called ResGCN-A. Firstly, we integrated lncRNA functional similarity, lncRNA Gaussian interaction profile kernel similarity, disease semantic similarity, and disease Gaussian interaction profile kernel similarity to obtain lncRNA comprehensive similarity and disease comprehensive similarity. Secondly, the residual graph convolutional network was used to extract the local features of lncRNAs and diseases. Thirdly, the new attention mechanism was used to assign the weight of the above features to further obtain the potential features of lncRNAs and diseases. Finally, the training set required by the Extra-Trees classifier was obtained by concatenating potential features, and the potential associations between lncRNAs and diseases were obtained by the trained Extra-Trees classifier. ResGCN-A combines the residual graph convolutional network with the attention mechanism to realize the local and global features fusion of lncRNA and diseases, which is beneficial to obtain more accurate features and improve the prediction accuracy. In the experiment, ResGCN-A was compared with five other methods through 5-fold cross-validation. The results show that the AUC value and AUPR value obtained by ResGCN-A are 0.9916 and 0.9951, which are superior to the other five methods. In addition, case studies and robustness evaluation have shown that ResGCN-A is an effective method for predicting lncRNA-disease associations. The source code for ResGCN-A will be available at https://github.com/Wangxiuxiun/ResGCN-A .
长链非编码RNA(lncRNAs)是一类长度超过200个核苷酸的非编码RNA。越来越多的证据表明,lncRNAs与疾病有着千丝万缕的联系。为弥补传统方法的不足,研究人员开始在数据库中收集相关生物数据,并使用生物信息学预测工具来预测lncRNAs与疾病之间的关联,这大大提高了研究效率。为提高现有方法的预测准确性,我们提出了一种新的带有注意力机制的lncRNA-疾病关联预测方法,称为ResGCN-A。首先,我们整合了lncRNA功能相似性、lncRNA高斯相互作用轮廓核相似性、疾病语义相似性和疾病高斯相互作用轮廓核相似性,以获得lncRNA综合相似性和疾病综合相似性。其次,使用残差图卷积网络来提取lncRNAs和疾病的局部特征。第三,利用新的注意力机制为上述特征分配权重,进一步获得lncRNAs和疾病的潜在特征。最后,通过拼接潜在特征得到Extra-Trees分类器所需的训练集,并通过训练后的Extra-Trees分类器获得lncRNAs与疾病之间的潜在关联。ResGCN-A将残差图卷积网络与注意力机制相结合,实现了lncRNA和疾病的局部与全局特征融合,有利于获得更准确的特征并提高预测准确性。在实验中,通过5折交叉验证将ResGCN-A与其他五种方法进行了比较。结果表明,ResGCN-A获得的AUC值和AUPR值分别为0.9916和0.9951,优于其他五种方法。此外,案例研究和稳健性评估表明,ResGCN-A是一种预测lncRNA-疾病关联的有效方法。ResGCN-A的源代码将在https://github.com/Wangxiuxiun/ResGCN-A上提供。