• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于层次向量空间模型的改进蛋白质-蛋白质相互作用推断方法。

An improved approach to infer protein-protein interaction based on a hierarchical vector space model.

机构信息

Department of Computer Science & Technology, East China Normal University, North Zhongshan Road, Shanghai, 200062, China.

School of life science, East China Normal University, Dongchuan Road, Shanghai, 200241, China.

出版信息

BMC Bioinformatics. 2018 Apr 27;19(1):161. doi: 10.1186/s12859-018-2152-z.

DOI:10.1186/s12859-018-2152-z
PMID:29699476
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5921294/
Abstract

BACKGROUND

Comparing and classifying functions of gene products are important in today's biomedical research. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most widely used indicators for protein interaction. Among the various approaches proposed, those based on the vector space model are relatively simple, but their effectiveness is far from satisfying.

RESULTS

We propose a Hierarchical Vector Space Model (HVSM) for computing semantic similarity between different genes or their products, which enhances the basic vector space model by introducing the relation between GO terms. Besides the directly annotated terms, HVSM also takes their ancestors and descendants related by "is_a" and "part_of" relations into account. Moreover, HVSM introduces the concept of a Certainty Factor to calibrate the semantic similarity based on the number of terms annotated to genes. To assess the performance of our method, we applied HVSM to Homo sapiens and Saccharomyces cerevisiae protein-protein interaction datasets. Compared with TCSS, Resnik, and other classic similarity measures, HVSM achieved significant improvement for distinguishing positive from negative protein interactions. We also tested its correlation with sequence, EC, and Pfam similarity using online tool CESSM.

CONCLUSIONS

HVSM showed an improvement of up to 4% compared to TCSS, 8% compared to IntelliGO, 12% compared to basic VSM, 6% compared to Resnik, 8% compared to Lin, 11% compared to Jiang, 8% compared to Schlicker, and 11% compared to SimGIC using AUC scores. CESSM test showed HVSM was comparable to SimGIC, and superior to all other similarity measures in CESSM as well as TCSS. Supplementary information and the software are available at https://github.com/kejia1215/HVSM .

摘要

背景

在当今的生物医学研究中,比较和分类基因产物的功能非常重要。基于基因本体论(GO)注释的语义相似性已被视为蛋白质相互作用最广泛使用的指标之一。在提出的各种方法中,基于向量空间模型的方法相对简单,但效果远不能令人满意。

结果

我们提出了一种用于计算不同基因或其产物之间语义相似性的层次向量空间模型(HVSM),该模型通过引入 GO 术语之间的关系来增强基本向量空间模型。除了直接注释的术语外,HVSM 还考虑了通过“is_a”和“part_of”关系相关的其祖先和后代。此外,HVSM 引入了置信度因子的概念,根据注释到基因的术语数量来校准语义相似性。为了评估我们方法的性能,我们将 HVSM 应用于 Homo sapiens 和 Saccharomyces cerevisiae 蛋白质-蛋白质相互作用数据集。与 TCSS、Resnik 和其他经典相似性度量相比,HVSM 在区分阳性和阴性蛋白质相互作用方面取得了显著的改进。我们还使用在线工具 CESSM 测试了其与序列、EC 和 Pfam 相似性的相关性。

结论

HVSM 在 AUC 评分方面与 TCSS 相比提高了 4%,与 IntelliGO 相比提高了 8%,与基本 VSM 相比提高了 12%,与 Resnik 相比提高了 6%,与 Lin 相比提高了 8%,与 Jiang 相比提高了 11%,与 Schlicker 相比提高了 8%,与 SimGIC 相比提高了 11%。CESSM 测试表明,HVSM 与 SimGIC 相当,在 CESSM 和 TCSS 中均优于所有其他相似性度量。补充信息和软件可在 https://github.com/kejia1215/HVSM 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/2ca234ed6e8f/12859_2018_2152_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/6577f2bf8519/12859_2018_2152_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/5f19fb861e8c/12859_2018_2152_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/386ecc3a5e04/12859_2018_2152_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/4a2e58bf82e7/12859_2018_2152_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/8215cfebebe2/12859_2018_2152_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/3a75e253fc79/12859_2018_2152_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/f5fcfdb5dbea/12859_2018_2152_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/2ca234ed6e8f/12859_2018_2152_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/6577f2bf8519/12859_2018_2152_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/5f19fb861e8c/12859_2018_2152_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/386ecc3a5e04/12859_2018_2152_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/4a2e58bf82e7/12859_2018_2152_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/8215cfebebe2/12859_2018_2152_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/3a75e253fc79/12859_2018_2152_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/f5fcfdb5dbea/12859_2018_2152_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/2ca234ed6e8f/12859_2018_2152_Fig8_HTML.jpg

相似文献

1
An improved approach to infer protein-protein interaction based on a hierarchical vector space model.基于层次向量空间模型的改进蛋白质-蛋白质相互作用推断方法。
BMC Bioinformatics. 2018 Apr 27;19(1):161. doi: 10.1186/s12859-018-2152-z.
2
IntelliGO: a new vector-based semantic similarity measure including annotation origin.IntelliGO:一种新的基于向量的语义相似性度量方法,包含注释来源。
BMC Bioinformatics. 2010 Dec 1;11:588. doi: 10.1186/1471-2105-11-588.
3
An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology.一种利用基因本体论中的语义相似度来改进蛋白质-蛋白质相互作用评分的方法。
BMC Bioinformatics. 2010 Nov 15;11:562. doi: 10.1186/1471-2105-11-562.
4
Correlating information contents of gene ontology terms to infer semantic similarity of gene products.关联基因本体术语的信息内容以推断基因产物的语义相似性。
Comput Math Methods Med. 2014;2014:891842. doi: 10.1155/2014/891842. Epub 2014 May 22.
5
Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty.通过探索术语下的本体和建模不确定性来改进 GO 语义相似性度量。
Bioinformatics. 2012 May 15;28(10):1383-9. doi: 10.1093/bioinformatics/bts129. Epub 2012 Apr 19.
6
Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.改进基因本体术语和基因产物之间语义相似度的测量:基于边缘和 IC 的混合方法的见解。
PLoS One. 2013 May 31;8(5):e66745. doi: 10.1371/journal.pone.0066745. Print 2013.
7
TransformerGO: predicting protein-protein interactions by modelling the attention between sets of gene ontology terms.TransformerGO:通过建模基因本体论术语集之间的注意力来预测蛋白质-蛋白质相互作用。
Bioinformatics. 2022 Apr 12;38(8):2269-2277. doi: 10.1093/bioinformatics/btac104.
8
Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations.Onto2Vec:基于向量的生物实体联合表示及其基于本体论的标注。
Bioinformatics. 2018 Jul 1;34(13):i52-i60. doi: 10.1093/bioinformatics/bty259.
9
Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins.基于基因本体论的语义相似性度量的比较分析及其在识别必需蛋白质中的应用。
PLoS One. 2023 Apr 21;18(4):e0284274. doi: 10.1371/journal.pone.0284274. eCollection 2023.
10
GO2Vec: transforming GO terms and proteins to vector representations via graph embeddings.GO2Vec:通过图嵌入将 GO 术语和蛋白质转换为向量表示。
BMC Genomics. 2019 Dec 24;20(Suppl 9):918. doi: 10.1186/s12864-019-6272-2.

引用本文的文献

1
Large-Scale Protein Interactions Prediction by Multiple Evidence Analysis Associated With an In-Silico Curation Strategy.基于计算机筛选策略的多证据分析进行大规模蛋白质相互作用预测
Front Bioinform. 2021 Sep 6;1:731345. doi: 10.3389/fbinf.2021.731345. eCollection 2021.
2
A Collection of Benchmark Data Sets for Knowledge Graph-based Similarity in the Biomedical Domain.生物医学领域基于知识图的相似度的基准数据集集合。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa078.
3
Computational identification of protein-protein interactions in model plant proteomes.

本文引用的文献

1
Assessment of Semantic Similarity between Proteins Using Information Content and Topological Properties of the Gene Ontology Graph.使用信息内容和基因本体论图的拓扑属性评估蛋白质之间的语义相似性。
IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):839-849. doi: 10.1109/TCBB.2017.2689762. Epub 2017 Mar 31.
2
GFD-Net: A novel semantic similarity methodology for the analysis of gene networks.GFD-Net:一种用于基因网络分析的新型语义相似性方法。
J Biomed Inform. 2017 Apr;68:71-82. doi: 10.1016/j.jbi.2017.02.013. Epub 2017 Mar 6.
3
The effects of shared information on semantic calculations in the gene ontology.
计算鉴定模式植物蛋白质组中的蛋白质-蛋白质相互作用。
Sci Rep. 2019 Jun 19;9(1):8740. doi: 10.1038/s41598-019-45072-8.
4
Protein-protein interaction prediction using a hybrid feature representation and a stacked generalization scheme.使用混合特征表示和堆叠泛化方案进行蛋白质-蛋白质相互作用预测。
BMC Bioinformatics. 2019 Jun 10;20(1):308. doi: 10.1186/s12859-019-2907-1.
共享信息对基因本体中语义计算的影响。
Comput Struct Biotechnol J. 2017 Jan 30;15:195-211. doi: 10.1016/j.csbj.2017.01.009. eCollection 2017.
4
TopoICSim: a new semantic similarity measure based on gene ontology.TopoICSim:一种基于基因本体论的新语义相似性度量方法。
BMC Bioinformatics. 2016 Jul 29;17(1):296. doi: 10.1186/s12859-016-1160-0.
5
Gene Ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery.基因本体语义相似性工具:生物知识发现的特征与挑战综述
Brief Bioinform. 2017 Sep 1;18(5):886-901. doi: 10.1093/bib/bbw067.
6
Protein-protein interaction inference based on semantic similarity of Gene Ontology terms.基于基因本体术语语义相似性的蛋白质-蛋白质相互作用推断
J Theor Biol. 2016 Jul 21;401:30-7. doi: 10.1016/j.jtbi.2016.04.020. Epub 2016 Apr 23.
7
Prediction of protein-protein interactions with clustered amino acids and weighted sparse representation.基于聚类氨基酸和加权稀疏表示的蛋白质-蛋白质相互作用预测
Int J Mol Sci. 2015 May 13;16(5):10855-69. doi: 10.3390/ijms160510855.
8
AdaBoost based multi-instance transfer learning for predicting proteome-wide interactions between Salmonella and human proteins.基于AdaBoost的多实例迁移学习用于预测沙门氏菌与人类蛋白质之间的全蛋白质组相互作用。
PLoS One. 2014 Oct 17;9(10):e110488. doi: 10.1371/journal.pone.0110488. eCollection 2014.
9
Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.改进基因本体术语和基因产物之间语义相似度的测量:基于边缘和 IC 的混合方法的见解。
PLoS One. 2013 May 31;8(5):e66745. doi: 10.1371/journal.pone.0066745. Print 2013.
10
Semantic similarity analysis of protein data: assessment with biological features and issues.蛋白质数据的语义相似性分析:生物特征和问题的评估。
Brief Bioinform. 2012 Sep;13(5):569-85. doi: 10.1093/bib/bbr066. Epub 2011 Dec 2.