Suppr超能文献

KSFinder——一种用于激酶新磷酸化底物链接预测的知识图谱模型。

KSFinder-a knowledge graph model for link prediction of novel phosphorylated substrates of kinases.

机构信息

Center for Bioinformatics and Computational Biology, University of Delware, Newark, DE, United States of America.

Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, United States of America.

出版信息

PeerJ. 2023 Oct 6;11:e16164. doi: 10.7717/peerj.16164. eCollection 2023.

Abstract

BACKGROUND

Aberrant protein kinase regulation leading to abnormal substrate phosphorylation is associated with several human diseases. Despite the promise of therapies targeting kinases, many human kinases remain understudied. Most existing computational tools predicting phosphorylation cover less than 50% of known human kinases. They utilize local feature selection based on protein sequences, motifs, domains, structures, and/or functions, and do not consider the heterogeneous relationships of the proteins. In this work, we present KSFinder, a tool that predicts kinase-substrate links by capturing the inherent association of proteins in a network comprising 85% of the known human kinases. We also postulate the potential role of two understudied kinases based on their substrate predictions from KSFinder.

METHODS

KSFinder learns the semantic relationships in a phosphoproteome knowledge graph using a knowledge graph embedding algorithm and represents the nodes in low-dimensional vectors. A multilayer perceptron (MLP) classifier is trained to discern kinase-substrate links using the embedded vectors. KSFinder uses a strategic negative generation approach that eliminates biases in entity representation and combines data from experimentally validated non-interacting protein pairs, proteins from different subcellular locations, and random sampling. We assess KSFinder's generalization capability on four different datasets and compare its performance with other state-of-the-art prediction models. We employ KSFinder to predict substrates of 68 "dark" kinases considered understudied by the Illuminating the Druggable Genome program and use our text-mining tool, RLIMS-P along with manual curation, to search for literature evidence for the predictions. In a case study, we performed functional enrichment analysis for two dark kinases - HIPK3 and CAMKK1 using their predicted substrates.

RESULTS

KSFinder shows improved performance over other kinase-substrate prediction models and generalized prediction ability on different datasets. We identified literature evidence for 17 novel predictions involving an understudied kinase. All of these 17 predictions had a probability score ≥0.7 (nine at >0.9, six at 0.8-0.9, and two at 0.7-0.8). The evaluation of 93,593 negative predictions (probability ≤0.3) identified four false negatives. The top enriched biological processes of HIPK3 substrates relate to the regulation of extracellular matrix and epigenetic gene expression, while CAMKK1 substrates include lipid storage regulation and glucose homeostasis.

CONCLUSIONS

KSFinder outperforms the current kinase-substrate prediction tools with higher kinase coverage. The strategically developed negatives provide a superior generalization ability for KSFinder. We predicted substrates of 432 kinases, 68 of which are understudied, and hypothesized the potential functions of two dark kinases using their predicted substrates.

摘要

背景

导致异常底物磷酸化的异常蛋白激酶调节与几种人类疾病有关。尽管针对激酶的治疗方法前景广阔,但许多人类激酶仍未得到充分研究。大多数现有的预测磷酸化的计算工具仅覆盖不到 50%的已知人类激酶。它们利用基于蛋白质序列、基序、结构域、结构和/或功能的局部特征选择,并且不考虑蛋白质的异质关系。在这项工作中,我们提出了 KSFinder,这是一种通过捕获由 85%已知人类激酶组成的网络中蛋白质的固有关联来预测激酶-底物链接的工具。我们还根据 KSFinder 的底物预测,推测了两种研究不足的激酶的潜在作用。

方法

KSFinder 使用知识图嵌入算法学习磷酸化蛋白质组知识图中的语义关系,并使用多层感知机 (MLP) 分类器使用嵌入向量来区分激酶-底物链接。KSFinder 使用一种策略性的负生成方法来消除实体表示中的偏差,并结合来自实验验证的非相互作用蛋白对、来自不同亚细胞位置的蛋白和随机采样的数据。我们在四个不同的数据集上评估 KSFinder 的泛化能力,并将其性能与其他最先进的预测模型进行比较。我们使用 KSFinder 预测被认为是 Illuminating the Druggable Genome 计划中研究不足的 68 种“暗”激酶的底物,并使用我们的文本挖掘工具 RLIMS-P 以及手动策展来搜索文献证据以支持预测。在一个案例研究中,我们对两个暗激酶 - HIPK3 和 CAMKK1 及其预测的底物进行了功能富集分析。

结果

KSFinder 在不同数据集上的其他激酶-底物预测模型上显示出改进的性能和更广泛的泛化能力。我们为涉及研究不足的激酶的 17 个新预测找到了文献证据。所有这些 17 个预测的概率评分均≥0.7(9 个大于 0.9,6 个为 0.8-0.9,2 个为 0.7-0.8)。对 93,593 个负预测(概率≤0.3)的评估确定了四个假阴性。HIPK3 底物的顶级富集生物过程与细胞外基质和表观遗传基因表达的调节有关,而 CAMKK1 底物包括脂质储存调节和葡萄糖稳态。

结论

KSFinder 具有更高的激酶覆盖率,优于当前的激酶-底物预测工具。策略性开发的否定提供了 KSFinder 的卓越泛化能力。我们预测了 432 种激酶的底物,其中 68 种是研究不足的,并且根据其预测的底物推测了两种暗激酶的潜在功能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7feb/10561642/2d05a046e6b4/peerj-11-16164-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验