Suppr超能文献

LPI-deepGBDT:基于梯度提升决策树的多层深度框架,用于 lncRNA-蛋白质相互作用识别。

LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA-protein interaction identification.

机构信息

School of Computer Science, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China.

College of Life Sciences and Chemistry, Hunan University of Technology, No. 88, Taishan West Road, Tianyuan District, Zhuzhou, China.

出版信息

BMC Bioinformatics. 2021 Oct 4;22(1):479. doi: 10.1186/s12859-021-04399-8.

Abstract

BACKGROUND

Long noncoding RNAs (lncRNAs) play important roles in various biological and pathological processes. Discovery of lncRNA-protein interactions (LPIs) contributes to understand the biological functions and mechanisms of lncRNAs. Although wet experiments find a few interactions between lncRNAs and proteins, experimental techniques are costly and time-consuming. Therefore, computational methods are increasingly exploited to uncover the possible associations. However, existing computational methods have several limitations. First, majority of them were measured based on one simple dataset, which may result in the prediction bias. Second, few of them are applied to identify relevant data for new lncRNAs (or proteins). Finally, they failed to utilize diverse biological information of lncRNAs and proteins.

RESULTS

Under the feed-forward deep architecture based on gradient boosting decision trees (LPI-deepGBDT), this work focuses on classify unobserved LPIs. First, three human LPI datasets and two plant LPI datasets are arranged. Second, the biological features of lncRNAs and proteins are extracted by Pyfeat and BioProt, respectively. Thirdly, the features are dimensionally reduced and concatenated as a vector to represent an lncRNA-protein pair. Finally, a deep architecture composed of forward mappings and inverse mappings is developed to predict underlying linkages between lncRNAs and proteins. LPI-deepGBDT is compared with five classical LPI prediction models (LPI-BLS, LPI-CatBoost, PLIPCOM, LPI-SKF, and LPI-HNM) under three cross validations on lncRNAs, proteins, lncRNA-protein pairs, respectively. It obtains the best average AUC and AUPR values under the majority of situations, significantly outperforming other five LPI identification methods. That is, AUCs computed by LPI-deepGBDT are 0.8321, 0.6815, and 0.9073, respectively and AUPRs are 0.8095, 0.6771, and 0.8849, respectively. The results demonstrate the powerful classification ability of LPI-deepGBDT. Case study analyses show that there may be interactions between GAS5 and Q15717, RAB30-AS1 and O00425, and LINC-01572 and P35637.

CONCLUSIONS

Integrating ensemble learning and hierarchical distributed representations and building a multiple-layered deep architecture, this work improves LPI prediction performance as well as effectively probes interaction data for new lncRNAs/proteins.

摘要

背景

长链非编码 RNA(lncRNA)在各种生物和病理过程中发挥着重要作用。lncRNA-蛋白相互作用(LPI)的发现有助于理解 lncRNA 的生物学功能和机制。尽管湿实验发现了 lncRNA 与蛋白质之间的一些相互作用,但实验技术成本高且耗时。因此,越来越多的计算方法被用于发现可能的关联。然而,现有的计算方法存在几个局限性。首先,它们中的大多数都是基于一个简单的数据集进行测量的,这可能会导致预测偏差。其次,它们中很少有应用于识别新 lncRNA(或蛋白质)的相关数据。最后,它们未能利用 lncRNA 和蛋白质的多种生物学信息。

结果

在基于梯度提升决策树的前馈式深度架构(LPI-deepGBDT)下,本研究专注于分类未观测到的 LPIs。首先,安排了三个人类 LPI 数据集和两个植物 LPI 数据集。其次,通过 Pyfeat 和 BioProt 分别提取 lncRNA 和蛋白质的生物学特征。然后,将特征进行降维和拼接,形成一个向量来表示 lncRNA-蛋白质对。最后,构建了一个由正向映射和反向映射组成的深度架构,用于预测 lncRNA 和蛋白质之间的潜在联系。在对 lncRNA、蛋白质、lncRNA-蛋白质对进行的三种交叉验证中,LPI-deepGBDT 与五种经典的 LPI 预测模型(LPI-BLS、LPI-CatBoost、PLIPCOM、LPI-SKF 和 LPI-HNM)进行了比较。在大多数情况下,它获得了最佳的平均 AUC 和 AUPR 值,显著优于其他五种 LPI 识别方法。也就是说,LPI-deepGBDT 计算的 AUC 值分别为 0.8321、0.6815 和 0.9073,AUPR 值分别为 0.8095、0.6771 和 0.8849。结果表明,LPI-deepGBDT 具有强大的分类能力。案例研究分析表明,GAS5 和 Q15717、RAB30-AS1 和 O00425、LINC-01572 和 P35637 之间可能存在相互作用。

结论

本研究通过集成集成学习和层次分布式表示,并构建多层深度架构,提高了 LPI 预测性能,并有效地探测了新 lncRNA/蛋白质的交互数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d6/8489074/e852b900a4b0/12859_2021_4399_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验