School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China.
Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China.
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae058.
Emerging clinical evidence suggests that sophisticated associations with circular ribonucleic acids (RNAs) (circRNAs) and microRNAs (miRNAs) are a critical regulatory factor of various pathological processes and play a critical role in most intricate human diseases. Nonetheless, the above correlations via wet experiments are error-prone and labor-intensive, and the underlying novel circRNA-miRNA association (CMA) has been validated by numerous existing computational methods that rely only on single correlation data. Considering the inadequacy of existing machine learning models, we propose a new model named BGF-CMAP, which combines the gradient boosting decision tree with natural language processing and graph embedding methods to infer associations between circRNAs and miRNAs. Specifically, BGF-CMAP extracts sequence attribute features and interaction behavior features by Word2vec and two homogeneous graph embedding algorithms, large-scale information network embedding and graph factorization, respectively. Multitudinous comprehensive experimental analysis revealed that BGF-CMAP successfully predicted the complex relationship between circRNAs and miRNAs with an accuracy of 82.90% and an area under receiver operating characteristic of 0.9075. Furthermore, 23 of the top 30 miRNA-associated circRNAs of the studies on data were confirmed in relevant experiences, showing that the BGF-CMAP model is superior to others. BGF-CMAP can serve as a helpful model to provide a scientific theoretical basis for the study of CMA prediction.
新兴的临床证据表明,复杂的环状核糖核酸(circRNAs)和 microRNA(miRNAs)与环状 RNA 的关联是各种病理过程的关键调节因子,并在大多数复杂的人类疾病中发挥关键作用。然而,通过湿实验得出的上述相关性存在错误和劳动密集型的问题,并且已经有许多现有的计算方法通过单一的相关性数据验证了潜在的新型环状 RNA-miRNA 关联(CMA)。考虑到现有机器学习模型的不足,我们提出了一个名为 BGF-CMAP 的新模型,该模型结合了梯度提升决策树与自然语言处理和图嵌入方法,以推断 circRNAs 和 miRNAs 之间的关联。具体来说,BGF-CMAP 通过 Word2vec 和两种同构图嵌入算法(大规模信息网络嵌入和图因子分解)提取序列属性特征和交互行为特征。大量全面的实验分析表明,BGF-CMAP 成功地预测了 circRNAs 和 miRNAs 之间的复杂关系,准确率为 82.90%,接收器操作特征曲线下面积为 0.9075。此外,在相关经验中,研究中排名前 30 的 miRNA 相关 circRNAs 中有 23 个得到了证实,这表明 BGF-CMAP 模型优于其他模型。BGF-CMAP 可以作为一个有用的模型,为 CMA 预测的研究提供科学的理论基础。