Zhang Yaowu, Jin Xiu, Zhang Xiaodan
College of Information and Artificial Intelligence, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China.
Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China.
Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf453.
Small nucleolar RNAs (snoRNAs) play crucial roles in a wide range of biological processes, and studying their association with diseases can enhance our understanding of disease pathogenesis. Nevertheless, current knowledge of these associations is limited traditional biological experiments are both costly and time-consuming. Consequently, developing efficient computational methods is essential for predicting potential snoRNA-disease associations. We propose a novel prediction method based on non-negative matrix factorization and graph convolution for predicting snoRNA-disease associations (GCNMF-SDA). First, five different types of similarity information from snoRNA and disease entities are introduced to fully mine and refine the feature information. Then the snoRNA and disease similarity networks are integrated using nonlinearity approach Similarity Network Fusion (SNF), while the weighted K nearest known neighbors (WKNKN) algorithm is applied to optimize the snoRNA-disease association matrix. Following this, the graph convolution module and the non-negative matrix factorization module extract disease features and snoRNA features, respectively. After extracting these features, they are combined into a composite feature vector for each snoRNA-disease pair. Finally, the composite feature vectors along with their corresponding labels, are input into a multilayer perceptron for training. Our experiments, conducted using a rigorous five-fold cross-validation approach, reveal that the GCNMF-SDA model achieves an impressive area under the receiver operating characteristic curve (AUC-ROC) of 0.9659 and an area under the precision-recall curve (AUC-PR) of 0.9522. Furthermore, most of the novel associations identified by GCNMF-SDA were validated through case studies, underscoring the method's reliability in predicting potential relationships between snoRNAs and diseases.
小核仁RNA(snoRNAs)在广泛的生物过程中发挥着关键作用,研究它们与疾病的关联可以增进我们对疾病发病机制的理解。然而,目前对这些关联的了解有限,传统生物学实验既昂贵又耗时。因此,开发高效的计算方法对于预测潜在的snoRNA-疾病关联至关重要。我们提出了一种基于非负矩阵分解和图卷积的新型预测方法来预测snoRNA-疾病关联(GCNMF-SDA)。首先,引入来自snoRNA和疾病实体的五种不同类型的相似性信息,以充分挖掘和细化特征信息。然后,使用非线性方法相似性网络融合(SNF)整合snoRNA和疾病相似性网络,同时应用加权K最近邻已知邻居(WKNKN)算法优化snoRNA-疾病关联矩阵。在此之后,图卷积模块和非负矩阵分解模块分别提取疾病特征和snoRNA特征。提取这些特征后,将它们组合成每个snoRNA-疾病对的复合特征向量。最后,将复合特征向量及其相应标签输入多层感知器进行训练。我们使用严格的五折交叉验证方法进行的实验表明,GCNMF-SDA模型在接收器操作特征曲线下面积(AUC-ROC)达到了令人印象深刻的0.9659,在精确召回率曲线下面积(AUC-PR)达到了0.9522。此外,GCNMF-SDA识别出的大多数新关联通过案例研究得到了验证,这突出了该方法在预测snoRNA与疾病之间潜在关系方面的可靠性。