Shanghai Key Laboratory of Forensic Medicine, Academy of Forensic Science; Key Laboratory of National Ministry of Health for Forensic Sciences, School of Medicine & Forensics, Health Science Center, Xi'an Jiaotong University, Xi'an, China.
Key Laboratory of National Ministry of Health for Forensic Sciences, School of Medicine & Forensics, Health Science Center, Xi'an Jiaotong University, Xi'an, China.
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac046.
The lack of a reliable and easy-to-operate screening pipeline for disease-related noncoding RNA regulatory axis is a problem that needs to be solved urgently. To address this, we designed a hybrid pipeline, disease-related lncRNA-miRNA-mRNA regulatory axis prediction from multiomics (DLRAPom), to identify risk biomarkers and disease-related lncRNA-miRNA-mRNA regulatory axes by adding a novel machine learning model on the basis of conventional analysis and combining experimental validation. The pipeline consists of four parts, including selecting hub biomarkers by conventional bioinformatics analysis, discovering the most essential protein-coding biomarkers by a novel machine learning model, extracting the key lncRNA-miRNA-mRNA axis and validating experimentally. Our study is the first one to propose a new pipeline predicting the interactions between lncRNA and miRNA and mRNA by combining WGCNA and XGBoost. Compared with the methods reported previously, we developed an Optimized XGBoost model to reduce the degree of overfitting in multiomics data, thereby improving the generalization ability of the overall model for the integrated analysis of multiomics data. With applications to gestational diabetes mellitus (GDM), we predicted nine risk protein-coding biomarkers and some potential lncRNA-miRNA-mRNA regulatory axes, which all correlated with GDM. In those regulatory axes, the MALAT1/hsa-miR-144-3p/IRS1 axis was predicted to be the key axis and was identified as being associated with GDM for the first time. In short, as a flexible pipeline, DLRAPom can contribute to molecular pathogenesis research of diseases, effectively predicting potential disease-related noncoding RNA regulatory networks and providing promising candidates for functional research on disease pathogenesis.
缺乏可靠且易于操作的疾病相关非编码 RNA 调控轴筛选管道是一个亟待解决的问题。针对这一问题,我们设计了一种混合管道,即通过添加新的机器学习模型,在常规分析的基础上,结合实验验证,从多组学中预测疾病相关 lncRNA-miRNA-mRNA 调控轴(DLRAPom),以识别风险生物标志物和疾病相关的 lncRNA-miRNA-mRNA 调控轴。该管道由四个部分组成,包括通过常规生物信息学分析选择枢纽生物标志物、通过新的机器学习模型发现最关键的蛋白质编码生物标志物、提取关键的 lncRNA-miRNA-mRNA 轴并进行实验验证。我们的研究首次提出了一种通过结合 WGCNA 和 XGBoost 预测 lncRNA 和 miRNA 与 mRNA 之间相互作用的新管道。与以前报道的方法相比,我们开发了一种优化的 XGBoost 模型,以减少多组学数据中过拟合的程度,从而提高整体模型对多组学数据综合分析的泛化能力。在应用于妊娠糖尿病(GDM)时,我们预测了九个风险蛋白编码生物标志物和一些潜在的 lncRNA-miRNA-mRNA 调控轴,这些都与 GDM 相关。在这些调控轴中,MALAT1/hsa-miR-144-3p/IRS1 轴被预测为关键轴,并且首次被确定与 GDM 相关。总之,作为一种灵活的管道,DLRAPom 可以为疾病的分子发病机制研究做出贡献,有效预测潜在的疾病相关非编码 RNA 调控网络,并为疾病发病机制的功能研究提供有前途的候选物。