Chuang Kai-Cheng, Cheng Ping-Sung, Tsai Yu-Hung, Tsai Meng-Hsiun
Department of Life Sciences, National Chung Hsing University, Taichung, 402, Taiwan.
Department of Management Information Systems, National Chung Hsing University, Taichung, 402, Taiwan.
BMC Genom Data. 2025 Jan 14;26(1):4. doi: 10.1186/s12863-024-01293-z.
miRNAs (microRNAs) are endogenous RNAs with lengths of 18 to 24 nucleotides and play critical roles in gene regulation and disease progression. Although traditional wet-lab experiments provide direct evidence for miRNA-disease associations, they are often time-consuming and complicated to analyze by current bioinformatics tools. In recent years, machine learning (ML) and deep learning (DL) techniques are powerful tools to analyze large-scale biological data. Hence, developing a model to predict, identify, and rank connections in miRNAs and diseases can significantly enhance the precision and efficiency in investigating the relationships between miRNAs and diseases.
In this study, we utilized miRNA-disease association data obtained by biotechnological experiments to develop a DL model for miRNA-disease associations. To improve the accuracy of prediction in this model, we introduced two labeling strategies, weight-based and majority-based definitions, to classify miRNA-disease associations. After preprocessing, data was trained with a novel model combining gated recurrent units (GRU) and graph convolutional network (GCN) to predict the level of miRNA-disease associations. The miRNA-disease association datasets were from HMDD (the Human miRNA Disease Database) and categorized by two distinct labeling approaches, weight-based definitions and majority-based definitions. We classified the miRNA-disease associations into three groups, "upregulated", "downregulated" and "nonspecific", by regression analysis and multiclass classification. This GRU-GCN coordinated model achieved a robust area under the curve (AUC) score of 0.8 in all datasets, demonstrating the efficacy in predicting potential miRNA-disease relationships.
By introducing innovative label-preprocessing methods, this study addressed the relationships between miRNAs and diseases, and improved the ambiguity of the results in different experiments. Based on these refined label definitions, we developed a DL-based model to refine and predict the results of associations between miRNAs and diseases. This model offers a valuable tool for complementing traditional experimental methods and enhancing our understanding of miRNA-related disease mechanisms.
微小RNA(miRNA)是长度为18至24个核苷酸的内源性RNA,在基因调控和疾病进展中发挥关键作用。尽管传统的湿实验室实验为miRNA与疾病的关联提供了直接证据,但通过当前的生物信息学工具对其进行分析往往既耗时又复杂。近年来,机器学习(ML)和深度学习(DL)技术是分析大规模生物数据的强大工具。因此,开发一个模型来预测、识别和排列miRNA与疾病之间的关联,可以显著提高研究miRNA与疾病关系的精度和效率。
在本研究中,我们利用生物技术实验获得的miRNA与疾病的关联数据,开发了一个用于miRNA与疾病关联的深度学习模型。为了提高该模型预测的准确性,我们引入了两种标记策略,即基于权重和基于多数的定义,对miRNA与疾病的关联进行分类。经过预处理后,数据使用一种结合门控循环单元(GRU)和图卷积网络(GCN)的新型模型进行训练,以预测miRNA与疾病关联的水平。miRNA与疾病的关联数据集来自HMDD(人类miRNA疾病数据库),并通过两种不同的标记方法进行分类,即基于权重的定义和基于多数的定义。我们通过回归分析和多类分类将miRNA与疾病的关联分为三组:“上调”、“下调”和“非特异性”。这种GRU-GCN协同模型在所有数据集中均取得了稳健的曲线下面积(AUC)得分0.8,证明了其在预测潜在miRNA与疾病关系方面的有效性。
通过引入创新的标签预处理方法,本研究解决了miRNA与疾病之间的关系,并改善了不同实验结果的模糊性。基于这些精确的标签定义,我们开发了一种基于深度学习的模型来优化和预测miRNA与疾病关联的结果。该模型为补充传统实验方法和增强我们对miRNA相关疾病机制的理解提供了一个有价值的工具。