• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

LPI-EnEDT:一种用于不平衡长链非编码RNA-蛋白质相互作用数据分类的集成框架,包含额外树和决策树分类器。

LPI-EnEDT: an ensemble framework with extra tree and decision tree classifiers for imbalanced lncRNA-protein interaction data classification.

作者信息

Peng Lihong, Yuan Ruya, Shen Ling, Gao Pengfei, Zhou Liqian

机构信息

School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.

College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.

出版信息

BioData Min. 2021 Dec 3;14(1):50. doi: 10.1186/s13040-021-00277-4.

DOI:10.1186/s13040-021-00277-4
PMID:34861891
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8642957/
Abstract

BACKGROUND

Long noncoding RNAs (lncRNAs) have dense linkages with various biological processes. Identifying interacting lncRNA-protein pairs contributes to understand the functions and mechanisms of lncRNAs. Wet experiments are costly and time-consuming. Most computational methods failed to observe the imbalanced characterize of lncRNA-protein interaction (LPI) data. More importantly, they were measured based on a unique dataset, which produced the prediction bias.

RESULTS

In this study, we develop an Ensemble framework (LPI-EnEDT) with Extra tree and Decision Tree classifiers to implement imbalanced LPI data classification. First, five LPI datasets are arranged. Second, lncRNAs and proteins are separately characterized based on Pyfeat and BioTriangle and concatenated as a vector to represent each lncRNA-protein pair. Finally, an ensemble framework with Extra tree and decision tree classifiers is developed to classify unlabeled lncRNA-protein pairs. The comparative experiments demonstrate that LPI-EnEDT outperforms four classical LPI prediction methods (LPI-BLS, LPI-CatBoost, LPI-SKF, and PLIPCOM) under cross validations on lncRNAs, proteins, and LPIs. The average AUC values on the five datasets are 0.8480, 0,7078, and 0.9066 under the three cross validations, respectively. The average AUPRs are 0.8175, 0.7265, and 0.8882, respectively. Case analyses suggest that there are underlying associations between HOTTIP and Q9Y6M1, NRON and Q15717.

CONCLUSIONS

Fusing diverse biological features of lncRNAs and proteins and exploiting an ensemble learning model with Extra tree and decision tree classifiers, this work focus on imbalanced LPI data classification as well as interaction information inference for a new lncRNA (or protein).

摘要

背景

长链非编码RNA(lncRNA)与多种生物学过程紧密相关。识别相互作用的lncRNA-蛋白质对有助于理解lncRNA的功能和机制。湿实验成本高且耗时。大多数计算方法未能观察到lncRNA-蛋白质相互作用(LPI)数据的不平衡特征。更重要的是,它们是基于单一数据集进行测量的,这会产生预测偏差。

结果

在本研究中,我们开发了一个集成框架(LPI-EnEDT),结合Extra树和决策树分类器来实现不平衡LPI数据分类。首先,整理了五个LPI数据集。其次,基于Pyfeat和BioTriangle分别对lncRNA和蛋白质进行特征提取,并将其连接成一个向量来表示每个lncRNA-蛋白质对。最后,开发了一个结合Extra树和决策树分类器的集成框架,对未标记的lncRNA-蛋白质对进行分类。对比实验表明,在lncRNA、蛋白质和LPI的交叉验证中,LPI-EnEDT优于四种经典的LPI预测方法(LPI-BLS、LPI-CatBoost、LPI-SKF和PLIPCOM)。在三个交叉验证下,五个数据集上的平均AUC值分别为0.8480、0.7078和0.9066。平均AUPR分别为0.8175、0.7265和0.8882。案例分析表明,HOTTIP与Q9Y6M1、NRON与Q15717之间存在潜在关联。

结论

通过融合lncRNA和蛋白质的多种生物学特征,并利用结合Extra树和决策树分类器的集成学习模型,本研究致力于不平衡LPI数据分类以及对新lncRNA(或蛋白质)的相互作用信息推断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/9d557a0055e8/13040_2021_277_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/099a073db99f/13040_2021_277_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/1190afc550b4/13040_2021_277_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/42be1481cb21/13040_2021_277_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/ad57bc7bdb79/13040_2021_277_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/557b270d73e5/13040_2021_277_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/5daa9c437b3a/13040_2021_277_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/f617de10f1dd/13040_2021_277_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/74a88a726ed4/13040_2021_277_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/9d557a0055e8/13040_2021_277_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/099a073db99f/13040_2021_277_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/1190afc550b4/13040_2021_277_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/42be1481cb21/13040_2021_277_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/ad57bc7bdb79/13040_2021_277_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/557b270d73e5/13040_2021_277_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/5daa9c437b3a/13040_2021_277_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/f617de10f1dd/13040_2021_277_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/74a88a726ed4/13040_2021_277_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a604/8642957/9d557a0055e8/13040_2021_277_Fig9_HTML.jpg

相似文献

1
LPI-EnEDT: an ensemble framework with extra tree and decision tree classifiers for imbalanced lncRNA-protein interaction data classification.LPI-EnEDT:一种用于不平衡长链非编码RNA-蛋白质相互作用数据分类的集成框架,包含额外树和决策树分类器。
BioData Min. 2021 Dec 3;14(1):50. doi: 10.1186/s13040-021-00277-4.
2
LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA-protein interaction identification.LPI-deepGBDT:基于梯度提升决策树的多层深度框架,用于 lncRNA-蛋白质相互作用识别。
BMC Bioinformatics. 2021 Oct 4;22(1):479. doi: 10.1186/s12859-021-04399-8.
3
LPI-HyADBS: a hybrid framework for lncRNA-protein interaction prediction integrating feature selection and classification.LPI-HyADBS:一种集成特征选择和分类的 lncRNA-蛋白质相互作用预测的混合框架。
BMC Bioinformatics. 2021 Nov 26;22(1):568. doi: 10.1186/s12859-021-04485-x.
4
EnANNDeep: An Ensemble-based lncRNA-protein Interaction Prediction Framework with Adaptive k-Nearest Neighbor Classifier and Deep Models.EnANNDeep:基于集成学习的 lncRNA-蛋白质相互作用预测框架,采用自适应 k-最近邻分类器和深度模型。
Interdiscip Sci. 2022 Mar;14(1):209-232. doi: 10.1007/s12539-021-00483-y. Epub 2022 Jan 10.
5
Finding lncRNA-Protein Interactions Based on Deep Learning With Dual-Net Neural Architecture.基于双网络神经架构深度学习的长链非编码RNA-蛋白质相互作用研究
IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3456-3468. doi: 10.1109/TCBB.2021.3116232. Epub 2022 Dec 8.
6
A novel lncRNA-protein interaction prediction method based on deep forest with cascade forest structure.基于级联森林结构的深度森林新型长链非编码 RNA-蛋白质相互作用预测方法。
Sci Rep. 2021 Sep 23;11(1):18881. doi: 10.1038/s41598-021-98277-1.
7
RLF-LPI: An ensemble learning framework using sequence information for predicting lncRNA-protein interaction based on AE-ResLSTM and fuzzy decision.RLF-LPI:一种基于 AE-ResLSTM 和模糊决策的利用序列信息进行 lncRNA-蛋白质相互作用预测的集成学习框架。
Math Biosci Eng. 2022 Mar 11;19(5):4749-4764. doi: 10.3934/mbe.2022222.
8
SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions.SFPEL-LPI:基于序列的特征投影集成学习预测 LncRNA-蛋白质相互作用。
PLoS Comput Biol. 2018 Dec 11;14(12):e1006616. doi: 10.1371/journal.pcbi.1006616. eCollection 2018 Dec.
9
LPI-SKMSC: Predicting LncRNA-Protein Interactions with Segmented k-mer Frequencies and Multi-space Clustering.LPI-SKMSC:基于分段 k--mer 频率和多空间聚类的长链非编码 RNA-蛋白质相互作用预测。
Interdiscip Sci. 2024 Jun;16(2):378-391. doi: 10.1007/s12539-023-00598-4. Epub 2024 Jan 11.
10
Capsule-LPI: a LncRNA-protein interaction predicting tool based on a capsule network.胶囊-LPI:一种基于胶囊网络的 LncRNA-蛋白质相互作用预测工具。
BMC Bioinformatics. 2021 May 13;22(1):246. doi: 10.1186/s12859-021-04171-y.

引用本文的文献

1
Predicting lncRNA-protein interactions using a hybrid deep learning model with dinucleotide-codon fusion feature encoding.使用具有二核苷酸-密码子融合特征编码的混合深度学习模型预测长链非编码RNA-蛋白质相互作用。
BMC Genomics. 2024 Dec 28;25(1):1253. doi: 10.1186/s12864-024-11168-3.
2
An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction.用于改进天然-非天然蛋白质-蛋白质相互作用预测的集成分类器。
Int J Mol Sci. 2024 May 29;25(11):5957. doi: 10.3390/ijms25115957.
3
Exploring potential circRNA biomarkers for cancers based on double-line heterogeneous graph representation learning.

本文引用的文献

1
Bakkenolide‑IIIa ameliorates lipopolysaccharide‑induced inflammatory injury in human umbilical vein endothelial cells by upregulating LINC00294.Bakkenolide-IIIa 通过上调 LINC00294 减轻脂多糖诱导的人脐静脉内皮细胞炎症损伤。
Mol Med Rep. 2021 May;23(5). doi: 10.3892/mmr.2021.12016. Epub 2021 Mar 24.
2
Dysregulation of lncRNA NRON in diabetic cardiomyopathy protects against high glucoseinduced cardiomyocyte injury and inflammation.糖尿病性心肌病中lncRNA NRON的失调可预防高糖诱导的心肌细胞损伤和炎症。
J Biol Regul Homeost Agents. 2021 Mar-Apr;35(2):693-697. doi: 10.23812/21-03-L.
3
Globally ncRNAs Expression Profiling of TNBC and Screening of Functional lncRNA.
基于双线性异质图表示学习的癌症潜在环状 RNA 生物标志物研究
BMC Med Inform Decis Mak. 2024 Jun 6;24(1):159. doi: 10.1186/s12911-024-02564-6.
4
Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model.深度堆叠 AVPs:使用三片段进化特征和基于单词嵌入的多视角特征与深度堆叠模型预测抗病毒肽。
BMC Bioinformatics. 2024 Mar 7;25(1):102. doi: 10.1186/s12859-024-05726-5.
5
LDA-VGHB: identifying potential lncRNA-disease associations with singular value decomposition, variational graph auto-encoder and heterogeneous Newton boosting machine.LDA-VGHB:基于奇异值分解、变分图自动编码器和异质牛顿提升机识别潜在的 lncRNA-疾病关联。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad466.
6
Gene differential co-expression analysis of male infertility patients based on statistical and machine learning methods.基于统计和机器学习方法的男性不育患者基因差异共表达分析
Front Microbiol. 2023 Jan 27;14:1092143. doi: 10.3389/fmicb.2023.1092143. eCollection 2023.
7
MP-VHPPI: Meta predictor for viral host protein-protein interaction prediction in multiple hosts and viruses.MP-VHPPI:用于多宿主和病毒中病毒宿主蛋白质-蛋白质相互作用预测的元预测器
Front Med (Lausanne). 2022 Nov 16;9:1025887. doi: 10.3389/fmed.2022.1025887. eCollection 2022.
8
Identifying potential microRNA biomarkers for colon cancer and colorectal cancer through bound nuclear norm regularization.通过约束核范数正则化识别结肠癌和直肠癌的潜在微小RNA生物标志物。
Front Genet. 2022 Sep 22;13:980437. doi: 10.3389/fgene.2022.980437. eCollection 2022.
9
Finding Lung-Cancer-Related lncRNAs Based on Laplacian Regularized Least Squares With Unbalanced Bi-Random Walk.基于拉普拉斯正则化最小二乘与不平衡双随机游走寻找肺癌相关长链非编码RNA
Front Genet. 2022 Jul 22;13:933009. doi: 10.3389/fgene.2022.933009. eCollection 2022.
10
Inferring Latent Disease-lncRNA Associations by Label-Propagation Algorithm and Random Projection on a Heterogeneous Network.基于异质网络上的标签传播算法和随机投影推断潜在的疾病与长链非编码RNA关联
Front Genet. 2022 Feb 4;13:798632. doi: 10.3389/fgene.2022.798632. eCollection 2022.
三阴乳腺癌的全球非编码RNA表达谱分析及功能性长链非编码RNA的筛选
Front Bioeng Biotechnol. 2021 Jan 21;8:523127. doi: 10.3389/fbioe.2020.523127. eCollection 2020.
4
LPI-SKF: Predicting lncRNA-Protein Interactions Using Similarity Kernel Fusions.LPI-SKF:使用相似性核融合预测长链非编码RNA与蛋白质的相互作用。
Front Genet. 2020 Dec 9;11:615144. doi: 10.3389/fgene.2020.615144. eCollection 2020.
5
LDICDL: LncRNA-Disease Association Identification Based on Collaborative Deep Learning.LDICDL:基于协同深度学习的 lncRNA-疾病关联识别。
IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1715-1723. doi: 10.1109/TCBB.2020.3034910. Epub 2022 Jun 3.
6
LMI-DForest: A deep forest model towards the prediction of lncRNA-miRNA interactions.LMI-DForest:一种用于预测 lncRNA-miRNA 相互作用的深度森林模型。
Comput Biol Chem. 2020 Dec;89:107406. doi: 10.1016/j.compbiolchem.2020.107406. Epub 2020 Oct 20.
7
LINC00294 induced by GRP78 promotes cervical cancer development by promoting cell cycle transition.由GRP78诱导的LINC00294通过促进细胞周期转变来促进宫颈癌发展。
Oncol Lett. 2020 Nov;20(5):262. doi: 10.3892/ol.2020.12125. Epub 2020 Sep 21.
8
Long non coding RNA NRON inhibited breast cancer development through regulating miR-302b/SRSF2 axis.长链非编码RNA NRON通过调控miR-302b/SRSF2轴抑制乳腺癌发展。
Am J Transl Res. 2020 Aug 15;12(8):4683-4692. eCollection 2020.
9
MLCDForest: multi-label classification with deep forest in disease prediction for long non-coding RNAs.MLCDForest:基于深度森林的长非编码 RNA 疾病预测中的多标签分类。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa104.
10
LINC00294 negatively modulates cell proliferation in glioma through a neurofilament medium-mediated pathway via interacting with miR-1278.LINC00294通过与miR-1278相互作用,经由神经丝中链介导的途径对胶质瘤细胞增殖产生负向调节作用。
J Gene Med. 2020 Oct;22(10):e3235. doi: 10.1002/jgm.3235. Epub 2020 Jun 18.