School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.
Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Hefei, 230036, China.
Sci Rep. 2024 Sep 3;14(1):20490. doi: 10.1038/s41598-024-68897-4.
MicroRNAs (miRNAs) are a key class of endogenous non-coding RNAs that play a pivotal role in regulating diseases. Accurately predicting the intricate relationships between miRNAs and diseases carries profound implications for disease diagnosis, treatment, and prevention. However, these prediction tasks are highly challenging due to the complexity of the underlying relationships. While numerous effective prediction models exist for validating these associations, they often encounter information distortion due to limitations in efficiently retaining information during the encoding-decoding process. Inspired by Multi-layer Heterogeneous Graph Transformer and Machine Learning XGboost classifier algorithm, this study introduces a novel computational approach based on multi-layer heterogeneous encoder-machine learning decoder structure for miRNA-disease association prediction (MHXGMDA). First, we employ the multi-view similarity matrices as the input coding for MHXGMDA. Subsequently, we utilize the multi-layer heterogeneous encoder to capture the embeddings of miRNAs and diseases, aiming to capture the maximum amount of relevant features. Finally, the information from all layers is concatenated to serve as input to the machine learning classifier, ensuring maximal preservation of encoding details. We conducted a comprehensive comparison of seven different classifier models and ultimately selected the XGBoost algorithm as the decoder. This algorithm leverages miRNA embedding features and disease embedding features to decode and predict the association scores between miRNAs and diseases. We applied MHXGMDA to predict human miRNA-disease associations on two benchmark datasets. Experimental findings demonstrate that our approach surpasses several leading methods in terms of both the area under the receiver operating characteristic curve and the area under the precision-recall curve.
微小 RNA(miRNAs)是一类关键的内源性非编码 RNA,在调控疾病方面发挥着重要作用。准确预测 miRNAs 与疾病之间错综复杂的关系对疾病诊断、治疗和预防具有深远意义。然而,由于潜在关系的复杂性,这些预测任务极具挑战性。尽管存在许多有效的预测模型来验证这些关联,但由于在编码-解码过程中有效保留信息的能力有限,它们经常会遇到信息失真的问题。受多层异质图 Transformer 和机器学习 XGboost 分类器算法的启发,本研究提出了一种新的基于多层异质编码器-机器学习解码器结构的计算方法,用于 miRNA-疾病关联预测(MHXGMDA)。首先,我们将多视图相似性矩阵作为 MHXGMDA 的输入编码。然后,我们利用多层异质编码器来捕获 miRNA 和疾病的嵌入,旨在捕获最大数量的相关特征。最后,将所有层的信息串联起来作为机器学习分类器的输入,以确保编码细节得到最大程度的保留。我们对七种不同的分类器模型进行了全面比较,最终选择 XGBoost 算法作为解码器。该算法利用 miRNA 嵌入特征和疾病嵌入特征进行解码和预测 miRNA 与疾病之间的关联分数。我们将 MHXGMDA 应用于两个基准数据集上的人类 miRNA-疾病关联预测。实验结果表明,我们的方法在接收器操作特征曲线下面积和精度-召回曲线下面积方面均优于几种领先方法。