• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

IPCARF:利用增量主成分分析特征选择和随机森林分类器改进 lncRNA-疾病关联预测。

IPCARF: improving lncRNA-disease association prediction using incremental principal component analysis feature selection and a random forest classifier.

机构信息

School of Computer Science, Qufu Normal University, Rizhao, China.

Department of Internet of Things Engineering, Wuxi Taihu University, Wuxi, China.

出版信息

BMC Bioinformatics. 2021 Apr 1;22(1):175. doi: 10.1186/s12859-021-04104-9.

DOI:10.1186/s12859-021-04104-9
PMID:33794766
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8017839/
Abstract

BACKGROUND

Identifying lncRNA-disease associations not only helps to better comprehend the underlying mechanisms of various human diseases at the lncRNA level but also speeds up the identification of potential biomarkers for disease diagnoses, treatments, prognoses, and drug response predictions. However, as the amount of archived biological data continues to grow, it has become increasingly difficult to detect potential human lncRNA-disease associations from these enormous biological datasets using traditional biological experimental methods. Consequently, developing new and effective computational methods to predict potential human lncRNA diseases is essential.

RESULTS

Using a combination of incremental principal component analysis (IPCA) and random forest (RF) algorithms and by integrating multiple similarity matrices, we propose a new algorithm (IPCARF) based on integrated machine learning technology for predicting lncRNA-disease associations. First, we used two different models to compute a semantic similarity matrix of diseases from a directed acyclic graph of diseases. Second, a characteristic vector for each lncRNA-disease pair is obtained by integrating disease similarity, lncRNA similarity, and Gaussian nuclear similarity. Then, the best feature subspace is obtained by applying IPCA to decrease the dimension of the original feature set. Finally, we train an RF model to predict potential lncRNA-disease associations. The experimental results show that the IPCARF algorithm effectively improves the AUC metric when predicting potential lncRNA-disease associations. Before the parameter optimization procedure, the AUC value predicted by the IPCARF algorithm under 10-fold cross-validation reached 0.8529; after selecting the optimal parameters using the grid search algorithm, the predicted AUC of the IPCARF algorithm reached 0.8611.

CONCLUSIONS

We compared IPCARF with the existing LRLSLDA, LRLSLDA-LNCSIM, TPGLDA, NPCMF, and ncPred prediction methods, which have shown excellent performance in predicting lncRNA-disease associations. The compared results of 10-fold cross-validation procedures show that the predictions of the IPCARF method are better than those of the other compared methods.

摘要

背景

鉴定 lncRNA-疾病关联不仅有助于从 lncRNA 水平更好地理解各种人类疾病的潜在机制,而且还可以加速鉴定疾病诊断、治疗、预后和药物反应预测的潜在生物标志物。然而,随着存档生物数据量的不断增加,使用传统的生物实验方法从这些巨大的生物数据集中检测潜在的人类 lncRNA-疾病关联变得越来越困难。因此,开发新的有效的计算方法来预测潜在的人类 lncRNA 疾病至关重要。

结果

我们使用增量主成分分析(IPCA)和随机森林(RF)算法的组合,并整合多个相似性矩阵,提出了一种基于集成机器学习技术的新算法(IPCARF),用于预测 lncRNA-疾病关联。首先,我们使用两种不同的模型从疾病的有向无环图计算疾病的语义相似性矩阵。其次,通过整合疾病相似性、lncRNA 相似性和高斯核相似性,获得每个 lncRNA-疾病对的特征向量。然后,通过应用 IPCA 获得最佳特征子空间,以降低原始特征集的维度。最后,我们训练 RF 模型来预测潜在的 lncRNA-疾病关联。实验结果表明,IPCARF 算法在预测潜在的 lncRNA-疾病关联时有效地提高了 AUC 度量。在进行参数优化过程之前,IPCARF 算法在 10 倍交叉验证下的 AUC 值达到 0.8529;在使用网格搜索算法选择最佳参数后,IPCARF 算法的预测 AUC 达到 0.8611。

结论

我们将 IPCARF 与现有的 LRLSLDA、LRLSLDA-LNCSIM、TPGLDA、NPCMF 和 ncPred 预测方法进行了比较,这些方法在预测 lncRNA-疾病关联方面表现出了优异的性能。10 倍交叉验证程序的比较结果表明,IPCARF 方法的预测优于其他比较方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/e80f4a887630/12859_2021_4104_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/c3a5d05dd63f/12859_2021_4104_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/cd540f8be800/12859_2021_4104_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/45e24d588a58/12859_2021_4104_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/ae68c07b499a/12859_2021_4104_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/5f0efaee199d/12859_2021_4104_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/81634e716c82/12859_2021_4104_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/43874c4ad9eb/12859_2021_4104_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/e80f4a887630/12859_2021_4104_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/c3a5d05dd63f/12859_2021_4104_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/cd540f8be800/12859_2021_4104_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/45e24d588a58/12859_2021_4104_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/ae68c07b499a/12859_2021_4104_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/5f0efaee199d/12859_2021_4104_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/81634e716c82/12859_2021_4104_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/43874c4ad9eb/12859_2021_4104_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/383d/8017839/e80f4a887630/12859_2021_4104_Fig8_HTML.jpg

相似文献

1
IPCARF: improving lncRNA-disease association prediction using incremental principal component analysis feature selection and a random forest classifier.IPCARF:利用增量主成分分析特征选择和随机森林分类器改进 lncRNA-疾病关联预测。
BMC Bioinformatics. 2021 Apr 1;22(1):175. doi: 10.1186/s12859-021-04104-9.
2
A random forest based computational model for predicting novel lncRNA-disease associations.基于随机森林的计算模型预测新型 lncRNA-疾病关联。
BMC Bioinformatics. 2020 Mar 27;21(1):126. doi: 10.1186/s12859-020-3458-1.
3
CRlncRC: a machine learning-based method for cancer-related long noncoding RNA identification using integrated features.CRlncRC:一种基于机器学习的方法,利用整合特征识别癌症相关长链非编码RNA
BMC Med Genomics. 2018 Dec 31;11(Suppl 6):120. doi: 10.1186/s12920-018-0436-9.
4
A Learning-Based Method for LncRNA-Disease Association Identification Combing Similarity Information and Rotation Forest.一种基于学习的lncRNA-疾病关联识别方法:结合相似性信息与旋转森林
iScience. 2019 Sep 27;19:786-795. doi: 10.1016/j.isci.2019.08.030. Epub 2019 Aug 23.
5
IDSSIM: an lncRNA functional similarity calculation model based on an improved disease semantic similarity method.IDSSIM:一种基于改进疾病语义相似性方法的 lncRNA 功能相似性计算模型。
BMC Bioinformatics. 2020 Jul 31;21(1):339. doi: 10.1186/s12859-020-03699-9.
6
Prediction of lncRNA and disease associations based on residual graph convolutional networks with attention mechanism.基于带有注意力机制的残差图卷积网络的长链非编码RNA与疾病关联预测
Sci Rep. 2024 Mar 2;14(1):5185. doi: 10.1038/s41598-024-55957-y.
7
LDNFSGB: prediction of long non-coding rna and disease association using network feature similarity and gradient boosting.LDNFSGB:基于网络特征相似性和梯度提升的长非编码 RNA 与疾病关联预测
BMC Bioinformatics. 2020 Sep 3;21(1):377. doi: 10.1186/s12859-020-03721-0.
8
A novel target convergence set based random walk with restart for prediction of potential LncRNA-disease associations.基于新型目标收敛集的重启动随机游走算法预测潜在的 lncRNA-疾病关联
BMC Bioinformatics. 2019 Dec 3;20(1):626. doi: 10.1186/s12859-019-3216-4.
9
LDAEXC: LncRNA-Disease Associations Prediction with Deep Autoencoder and XGBoost Classifier.LDAEXC:基于深度自动编码器和 XGBoost 分类器的长链非编码 RNA-疾病关联预测。
Interdiscip Sci. 2023 Sep;15(3):439-451. doi: 10.1007/s12539-023-00573-z. Epub 2023 Jun 12.
10
gGATLDA: lncRNA-disease association prediction based on graph-level graph attention network.基于图级图注意力网络的 lncRNA-疾病关联预测
BMC Bioinformatics. 2022 Jan 4;23(1):11. doi: 10.1186/s12859-021-04548-z.

引用本文的文献

1
Decoding potential lncRNA and disease associations through graph representation learning and gradient boosting with histogram.通过基于直方图的图表示学习和梯度提升来解码潜在的长链非编码RNA与疾病的关联。
Sci Rep. 2025 Aug 26;15(1):31407. doi: 10.1038/s41598-025-16177-0.
2
Modeling ncRNA Synergistic Regulation in Cancer.癌症中ncRNA协同调控的建模
Methods Mol Biol. 2025;2883:377-402. doi: 10.1007/978-1-0716-4290-0_17.
3
Predicting lncRNA-Disease Associations Based on a Dual-Path Feature Extraction Network with Multiple Sources of Information Integration.

本文引用的文献

1
NPCMF: Nearest Profile-based Collaborative Matrix Factorization method for predicting miRNA-disease associations.NPCMF:基于最近邻 Profile 的协同矩阵分解方法,用于预测 miRNA-疾病关联。
BMC Bioinformatics. 2019 Jun 24;20(1):353. doi: 10.1186/s12859-019-2956-5.
2
Long non-coding RNAs: Functional regulatory players in breast cancer.长链非编码RNA:乳腺癌中的功能调节因子
Noncoding RNA Res. 2019 Feb 5;4(1):36-44. doi: 10.1016/j.ncrna.2019.01.003. eCollection 2019 Mar.
3
Landslide spatial modelling using novel bivariate statistical based Naïve Bayes, RBF Classifier, and RBF Network machine learning algorithms.
基于多源信息整合的双路径特征提取网络预测长链非编码RNA与疾病的关联
ACS Omega. 2024 Jul 30;9(32):35100-35112. doi: 10.1021/acsomega.4c05365. eCollection 2024 Aug 13.
4
IGCNSDA: unraveling disease-associated snoRNAs with an interpretable graph convolutional network.IGCNSDA:利用可解释图卷积网络揭示疾病相关 snoRNA
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae179.
5
Finding potential lncRNA-disease associations using a boosting-based ensemble learning model.使用基于提升的集成学习模型寻找潜在的长链非编码RNA-疾病关联。
Front Genet. 2024 Mar 1;15:1356205. doi: 10.3389/fgene.2024.1356205. eCollection 2024.
6
GCNFORMER: graph convolutional network and transformer for predicting lncRNA-disease associations.GCNFORMER:用于预测 lncRNA-疾病关联的图卷积网络和转换器。
BMC Bioinformatics. 2024 Jan 2;25(1):5. doi: 10.1186/s12859-023-05625-1.
7
LDA-VGHB: identifying potential lncRNA-disease associations with singular value decomposition, variational graph auto-encoder and heterogeneous Newton boosting machine.LDA-VGHB:基于奇异值分解、变分图自动编码器和异质牛顿提升机识别潜在的 lncRNA-疾病关联。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad466.
8
iLncDA-RSN: identification of lncRNA-disease associations based on reliable similarity networks.iLncDA-RSN:基于可靠相似性网络的长链非编码RNA-疾病关联识别
Front Genet. 2023 Aug 8;14:1249171. doi: 10.3389/fgene.2023.1249171. eCollection 2023.
9
Machine Learning-Based Blood RNA Signature for Diagnosis of Autism Spectrum Disorder.基于机器学习的自闭症谱系障碍血液 RNA 特征签名。
Int J Mol Sci. 2023 Jan 20;24(3):2082. doi: 10.3390/ijms24032082.
10
SCCPMD: Probability matrix decomposition method subject to corrected similarity constraints for inferring long non-coding RNA-disease associations.SCCPMD:基于校正相似性约束的概率矩阵分解方法用于推断长链非编码RNA与疾病的关联
Front Microbiol. 2023 Jan 11;13:1093615. doi: 10.3389/fmicb.2022.1093615. eCollection 2022.
基于新的二元统计的朴素贝叶斯、RBF 分类器和 RBF 网络机器学习算法的滑坡空间建模。
Sci Total Environ. 2019 May 1;663:1-15. doi: 10.1016/j.scitotenv.2019.01.329. Epub 2019 Jan 26.
4
TPGLDA: Novel prediction of associations between lncRNAs and diseases via lncRNA-disease-gene tripartite graph.TPGLDA:基于 lncRNA-疾病-基因三节点图预测 lncRNA 与疾病的关联
Sci Rep. 2018 Jan 18;8(1):1065. doi: 10.1038/s41598-018-19357-3.
5
Long non-coding RNA expression in bladder cancer.膀胱癌中的长链非编码RNA表达
Biophys Rev. 2018 Aug;10(4):1205-1213. doi: 10.1007/s12551-017-0379-y. Epub 2017 Dec 8.
6
Long noncoding RNA-HOTAIR affects chemoresistance by regulating HOXA1 methylation in small cell lung cancer cells.长链非编码 RNA-HOTAIR 通过调节小细胞肺癌细胞中 HOXA1 的甲基化来影响化疗耐药性。
Lab Invest. 2016 Jan;96(1):60-8. doi: 10.1038/labinvest.2015.123. Epub 2015 Nov 2.
7
Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers.Lnc2Cancer:一个人工整理的、包含与多种人类癌症相关的经实验证实的长链非编码RNA的数据库。
Nucleic Acids Res. 2016 Jan 4;44(D1):D980-5. doi: 10.1093/nar/gkv1094. Epub 2015 Oct 19.
8
Combined identification of long non-coding RNA XIST and HIF1A-AS1 in serum as an effective screening for non-small cell lung cancer.联合检测血清中长链非编码RNA XIST和HIF1A-AS1作为非小细胞肺癌的有效筛查方法
Int J Clin Exp Pathol. 2015 Jul 1;8(7):7887-95. eCollection 2015.
9
Predicting lncRNA-disease associations and constructing lncRNA functional similarity network based on the information of miRNA.基于miRNA信息预测lncRNA与疾病的关联并构建lncRNA功能相似性网络。
Sci Rep. 2015 Aug 17;5:13186. doi: 10.1038/srep13186.
10
ncPred: ncRNA-Disease Association Prediction through Tripartite Network-Based Inference.ncPred:基于三节点网络推理的 ncRNA-疾病关联预测。
Front Bioeng Biotechnol. 2014 Dec 12;2:71. doi: 10.3389/fbioe.2014.00071. eCollection 2014.