基于随机森林的计算模型预测新型 lncRNA-疾病关联。

A random forest based computational model for predicting novel lncRNA-disease associations.

机构信息

School of Software and Microelectronics, Harbin University of Science and Technology, Harbin, 150080, China.

College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, 150050, China.

出版信息

BMC Bioinformatics. 2020 Mar 27;21(1):126. doi: 10.1186/s12859-020-3458-1.

DOI:10.1186/s12859-020-3458-1

PMID:32216744

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7099795/

Abstract

BACKGROUND

Accumulated evidence shows that the abnormal regulation of long non-coding RNA (lncRNA) is associated with various human diseases. Accurately identifying disease-associated lncRNAs is helpful to study the mechanism of lncRNAs in diseases and explore new therapies of diseases. Many lncRNA-disease association (LDA) prediction models have been implemented by integrating multiple kinds of data resources. However, most of the existing models ignore the interference of noisy and redundancy information among these data resources.

RESULTS

To improve the ability of LDA prediction models, we implemented a random forest and feature selection based LDA prediction model (RFLDA in short). First, the RFLDA integrates the experiment-supported miRNA-disease associations (MDAs) and LDAs, the disease semantic similarity (DSS), the lncRNA functional similarity (LFS) and the lncRNA-miRNA interactions (LMI) as input features. Then, the RFLDA chooses the most useful features to train prediction model by feature selection based on the random forest variable importance score that takes into account not only the effect of individual feature on prediction results but also the joint effects of multiple features on prediction results. Finally, a random forest regression model is trained to score potential lncRNA-disease associations. In terms of the area under the receiver operating characteristic curve (AUC) of 0.976 and the area under the precision-recall curve (AUPR) of 0.779 under 5-fold cross-validation, the performance of the RFLDA is better than several state-of-the-art LDA prediction models. Moreover, case studies on three cancers demonstrate that 43 of the 45 lncRNAs predicted by the RFLDA are validated by experimental data, and the other two predicted lncRNAs are supported by other LDA prediction models.

CONCLUSIONS

Cross-validation and case studies indicate that the RFLDA has excellent ability to identify potential disease-associated lncRNAs.

摘要

背景

越来越多的证据表明，长链非编码 RNA（lncRNA）的异常调控与各种人类疾病有关。准确识别与疾病相关的 lncRNA 有助于研究 lncRNA 在疾病中的作用机制，并探索疾病的新疗法。许多 lncRNA-疾病关联（LDA）预测模型通过整合多种数据资源来实现。然而，大多数现有的模型忽略了这些数据资源中噪声和冗余信息的干扰。

结果

为了提高 LDA 预测模型的能力，我们实现了一种基于随机森林和特征选择的 LDA 预测模型（简称 RFLDA）。首先，RFLDA 将实验支持的 miRNA-疾病关联（MDAs）和 LDAs、疾病语义相似性（DSS）、lncRNA 功能相似性（LFS）和 lncRNA-miRNA 相互作用（LMI）整合为输入特征。然后，RFLDA 通过基于随机森林变量重要性得分的特征选择，选择最有用的特征来训练预测模型，该得分不仅考虑了单个特征对预测结果的影响，还考虑了多个特征对预测结果的联合影响。最后，使用随机森林回归模型对潜在的 lncRNA-疾病关联进行评分。在 5 折交叉验证下，RFLDA 的 AUC 为 0.976，AUPR 为 0.779，性能优于几种最新的 LDA 预测模型。此外，对三种癌症的案例研究表明，RFLDA 预测的 45 个 lncRNA 中有 43 个得到了实验数据的验证，另外两个预测的 lncRNA 得到了其他 LDA 预测模型的支持。

结论

交叉验证和案例研究表明，RFLDA 具有识别潜在疾病相关 lncRNA 的优异能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4da1/7099795/bb8d575c46cd/12859_2020_3458_Fig1_HTML.jpg

相似文献

A random forest based computational model for predicting novel lncRNA-disease associations.基于随机森林的计算模型预测新型 lncRNA-疾病关联。

BMC Bioinformatics. 2020 Mar 27;21(1):126. doi: 10.1186/s12859-020-3458-1.

Predicting binary, discrete and continued lncRNA-disease associations via a unified framework based on graph regression.通过基于图回归的统一框架预测二元、离散和连续的长链非编码RNA-疾病关联。

BMC Med Genomics. 2017 Dec 21;10(Suppl 4):65. doi: 10.1186/s12920-017-0305-y.

An improved random forest-based computational model for predicting novel miRNA-disease associations.基于随机森林的新型 miRNA-疾病关联预测计算模型的改进。

BMC Bioinformatics. 2019 Dec 3;20(1):624. doi: 10.1186/s12859-019-3290-7.

IDSSIM: an lncRNA functional similarity calculation model based on an improved disease semantic similarity method.IDSSIM：一种基于改进疾病语义相似性方法的 lncRNA 功能相似性计算模型。

BMC Bioinformatics. 2020 Jul 31;21(1):339. doi: 10.1186/s12859-020-03699-9.

A novel target convergence set based random walk with restart for prediction of potential LncRNA-disease associations.基于新型目标收敛集的重启动随机游走算法预测潜在的 lncRNA-疾病关联

BMC Bioinformatics. 2019 Dec 3;20(1):626. doi: 10.1186/s12859-019-3216-4.

Matrix factorization-based data fusion for the prediction of lncRNA-disease associations.基于矩阵分解的数据融合方法用于 lncRNA-疾病关联预测。

Bioinformatics. 2018 May 1;34(9):1529-1537. doi: 10.1093/bioinformatics/btx794.

A novel computational model for predicting potential LncRNA-disease associations based on both direct and indirect features of LncRNA-disease pairs.基于 LncRNA-疾病对的直接和间接特征预测潜在 LncRNA-疾病关联的新型计算模型。

BMC Bioinformatics. 2020 Dec 2;21(1):555. doi: 10.1186/s12859-020-03906-7.

LncDisAP: a computation model for LncRNA-disease association prediction based on multiple biological datasets.LncDisAP：基于多个生物数据集的 LncRNA 疾病关联预测计算模型。

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):582. doi: 10.1186/s12859-019-3081-1.

gGATLDA: lncRNA-disease association prediction based on graph-level graph attention network.基于图级图注意力网络的 lncRNA-疾病关联预测

BMC Bioinformatics. 2022 Jan 4;23(1):11. doi: 10.1186/s12859-021-04548-z.

CNNDLP: A Method Based on Convolutional Autoencoder and Convolutional Neural Network with Adjacent Edge Attention for Predicting lncRNA-Disease Associations.CNNDLP：一种基于卷积自动编码器和卷积神经网络的方法，具有相邻边缘注意力，用于预测 lncRNA-疾病关联。

Int J Mol Sci. 2019 Aug 30;20(17):4260. doi: 10.3390/ijms20174260.

引用本文的文献

A Deep Differential Analysis in Four Subtypes of Breast Cancer Based on Regulations of miRNA-mRNA.基于miRNA-mRNA调控的乳腺癌四种亚型深度差异分析

IET Syst Biol. 2025 Jan-Dec;19(1):e70020. doi: 10.1049/syb2.70020.

Predicting lncRNA and disease associations with graph autoencoder and noise robust gradient boosting.使用图自动编码器和噪声鲁棒梯度提升预测长链非编码RNA与疾病的关联

Sci Rep. 2025 May 31;15(1):19178. doi: 10.1038/s41598-025-03269-0.

Development and validation of a nomogram for predicting low Kt/V in peritoneal dialysis patients.用于预测腹膜透析患者低Kt/V的列线图的开发与验证

BMC Nephrol. 2025 May 2;26(1):223. doi: 10.1186/s12882-025-04124-0.

ACLNDA: an asymmetric graph contrastive learning framework for predicting noncoding RNA-disease associations in heterogeneous graphs.ACLNDA：一种用于在异质图中预测非编码 RNA-疾病关联的非对称图对比学习框架。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae533.

Bioinformatics identification and validation of maternal blood biomarkers and immune cell infiltration in preeclampsia: An observational study.生物信息学识别和验证子痫前期的母血生物标志物和免疫细胞浸润：一项观察性研究。

Medicine (Baltimore). 2024 May 24;103(21):e38260. doi: 10.1097/MD.0000000000038260.

Prediction of lncRNA and disease associations based on residual graph convolutional networks with attention mechanism.基于带有注意力机制的残差图卷积网络的长链非编码RNA与疾病关联预测

Sci Rep. 2024 Mar 2;14(1):5185. doi: 10.1038/s41598-024-55957-y.

Predicting lncRNA-disease associations using multiple metapaths in hierarchical graph attention networks.基于层次图注意网络的多代谢途径预测 lncRNA-疾病关联

BMC Bioinformatics. 2024 Jan 29;25(1):46. doi: 10.1186/s12859-024-05672-2.

Applying negative sample denoising and multi-view feature for lncRNA-disease association prediction.应用负样本去噪和多视图特征进行lncRNA-疾病关联预测。

Front Genet. 2024 Jan 9;14:1332273. doi: 10.3389/fgene.2023.1332273. eCollection 2023.

GCNFORMER: graph convolutional network and transformer for predicting lncRNA-disease associations.GCNFORMER：用于预测 lncRNA-疾病关联的图卷积网络和转换器。

BMC Bioinformatics. 2024 Jan 2;25(1):5. doi: 10.1186/s12859-023-05625-1.

LDA-VGHB: identifying potential lncRNA-disease associations with singular value decomposition, variational graph auto-encoder and heterogeneous Newton boosting machine.LDA-VGHB：基于奇异值分解、变分图自动编码器和异质牛顿提升机识别潜在的 lncRNA-疾病关联。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad466.

本文引用的文献

An improved random forest-based computational model for predicting novel miRNA-disease associations.基于随机森林的新型 miRNA-疾病关联预测计算模型的改进。

BMC Bioinformatics. 2019 Dec 3;20(1):624. doi: 10.1186/s12859-019-3290-7.

Mol Ther Nucleic Acids. 2019 Dec 6;18:45-55. doi: 10.1016/j.omtn.2019.07.022. Epub 2019 Aug 9.

LDAPred: A Method Based on Information Flow Propagation and a Convolutional Neural Network for the Prediction of Disease-Associated lncRNAs.LDAPred：一种基于信息流传播和卷积神经网络的疾病相关 lncRNA 预测方法。

Int J Mol Sci. 2019 Sep 10;20(18):4458. doi: 10.3390/ijms20184458.

A Learning-Based Method for LncRNA-Disease Association Identification Combing Similarity Information and Rotation Forest.一种基于学习的lncRNA-疾病关联识别方法：结合相似性信息与旋转森林

iScience. 2019 Sep 27;19:786-795. doi: 10.1016/j.isci.2019.08.030. Epub 2019 Aug 23.

Graph Convolutional Network and Convolutional Neural Network Based Method for Predicting lncRNA-Disease Associations.基于图卷积网络和卷积神经网络的 lncRNA-疾病关联预测方法。

Cells. 2019 Aug 30;8(9):1012. doi: 10.3390/cells8091012.

Int J Mol Sci. 2019 Aug 30;20(17):4260. doi: 10.3390/ijms20174260.

LncRNA-Disease Associations Prediction Using Bipartite Local Model With Nearest Profile-Based Association Inferring.基于二分局部模型和基于最近邻谱的关联推断的 LncRNA-疾病关联预测

IEEE J Biomed Health Inform. 2020 May;24(5):1519-1527. doi: 10.1109/JBHI.2019.2937827. Epub 2019 Aug 28.

LWPCMF: Logistic Weighted Profile-Based Collaborative Matrix Factorization for Predicting MiRNA-Disease Associations.LWPCMF：基于逻辑加权轮廓的协同矩阵分解用于预测miRNA与疾病的关联

IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):1122-1129. doi: 10.1109/TCBB.2019.2937774. Epub 2021 Jun 3.

ILDMSF: Inferring Associations Between Long Non-Coding RNA and Disease Based on Multi-Similarity Fusion.ILDMSF：基于多相似度融合的长非编码 RNA 与疾病关联推断。

IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):1106-1112. doi: 10.1109/TCBB.2019.2936476. Epub 2021 Jun 3.

A Novel Approach for Potential Human LncRNA-Disease Association Prediction Based on Local Random Walk.基于局部随机游走的人类长非编码 RNA 疾病关联预测新方法

IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):1049-1059. doi: 10.1109/TCBB.2019.2934958. Epub 2021 Jun 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于随机森林的计算模型预测新型 lncRNA-疾病关联。

A random forest based computational model for predicting novel lncRNA-disease associations.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献