• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种在嵌入空间中进行基因集分析的最佳匹配方法。

A best-match approach for gene set analyses in embedding spaces.

机构信息

Department of Computer Science, Rice University, Houston, Texas 77005, USA.

Department of Computer Science, Rice University, Houston, Texas 77005, USA

出版信息

Genome Res. 2024 Oct 11;34(9):1421-1433. doi: 10.1101/gr.279141.124.

DOI:10.1101/gr.279141.124
PMID:39231608
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11529866/
Abstract

Embedding methods have emerged as a valuable class of approaches for distilling essential information from complex high-dimensional data into more accessible lower-dimensional spaces. Applications of embedding methods to biological data have demonstrated that gene embeddings can effectively capture physical, structural, and functional relationships between genes. However, this utility has been primarily realized by using gene embeddings for downstream machine-learning tasks. Much less has been done to examine the embeddings directly, especially analyses of gene sets in embedding spaces. Here, we propose an Algorithm for Network Data Embedding and Similarity (ANDES), a novel best-match approach that can be used with existing gene embeddings to compare gene sets while reconciling gene set diversity. This intuitive method has important downstream implications for improving the utility of embedding spaces for various tasks. Specifically, we show how ANDES, when applied to different gene embeddings encoding protein-protein interactions, can be used as a novel overrepresentation- and rank-based gene set enrichment analysis method that achieves state-of-the-art performance. Additionally, ANDES can use multiorganism joint gene embeddings to facilitate functional knowledge transfer across organisms, allowing for phenotype mapping across model systems. Our flexible, straightforward best-match methodology can be extended to other embedding spaces with diverse community structures between set elements.

摘要

嵌入方法已经成为从复杂的高维数据中提取重要信息并将其转化为更易于访问的低维空间的一类有价值的方法。嵌入方法在生物数据中的应用表明,基因嵌入可以有效地捕捉基因之间的物理、结构和功能关系。然而,这种效用主要是通过将基因嵌入用于下游机器学习任务来实现的。很少有研究直接研究这些嵌入,特别是在嵌入空间中对基因集的分析。在这里,我们提出了一种网络数据嵌入和相似性算法(ANDES),这是一种新颖的最佳匹配方法,可以与现有的基因嵌入一起使用,在协调基因集多样性的同时比较基因集。这种直观的方法对提高嵌入空间在各种任务中的效用具有重要的下游意义。具体来说,我们展示了如何将 ANDES 应用于编码蛋白质-蛋白质相互作用的不同基因嵌入,将其用作一种新颖的基于过度表示和排名的基因集富集分析方法,实现了最先进的性能。此外,ANDES 可以使用多生物体联合基因嵌入来促进跨生物体的功能知识转移,允许在模型系统中进行表型映射。我们灵活、直接的最佳匹配方法可以扩展到具有不同社区结构的其他嵌入空间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/705f/11529866/83fa0cded647/1421f06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/705f/11529866/9607fc0bdd16/1421f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/705f/11529866/59acc13d0a43/1421f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/705f/11529866/e1ea6a3837e8/1421f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/705f/11529866/25dd58d8b995/1421f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/705f/11529866/6920b5ca1111/1421f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/705f/11529866/83fa0cded647/1421f06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/705f/11529866/9607fc0bdd16/1421f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/705f/11529866/59acc13d0a43/1421f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/705f/11529866/e1ea6a3837e8/1421f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/705f/11529866/25dd58d8b995/1421f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/705f/11529866/6920b5ca1111/1421f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/705f/11529866/83fa0cded647/1421f06.jpg

相似文献

1
A best-match approach for gene set analyses in embedding spaces.一种在嵌入空间中进行基因集分析的最佳匹配方法。
Genome Res. 2024 Oct 11;34(9):1421-1433. doi: 10.1101/gr.279141.124.
2
16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses.16S rRNA 序列嵌入:核苷酸序列有意义的数值特征表示形式,方便下游分析。
PLoS Comput Biol. 2019 Feb 26;15(2):e1006721. doi: 10.1371/journal.pcbi.1006721. eCollection 2019 Feb.
3
Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.基于文本挖掘的词表示在生物医学数据分析和机器学习任务中的蛋白质-蛋白质相互作用网络。
PLoS One. 2021 Oct 15;16(10):e0258623. doi: 10.1371/journal.pone.0258623. eCollection 2021.
4
Principled approach to the selection of the embedding dimension of networks.基于原则的网络嵌入维度选择方法。
Nat Commun. 2021 Jun 18;12(1):3772. doi: 10.1038/s41467-021-23795-5.
5
Survey on graph embeddings and their applications to machine learning problems on graphs.关于图嵌入及其在图上机器学习问题中的应用的综述。
PeerJ Comput Sci. 2021 Feb 4;7:e357. doi: 10.7717/peerj-cs.357. eCollection 2021.
6
Transfer learning in proteins: evaluating novel protein learned representations for bioinformatics tasks.蛋白质中的迁移学习:评估生物信息学任务中新型蛋白质学习表示。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac232.
7
Juxtapose: a gene-embedding approach for comparing co-expression networks.并列:一种用于比较共表达网络的基因嵌入方法。
BMC Bioinformatics. 2021 Mar 16;22(1):125. doi: 10.1186/s12859-021-04055-1.
8
BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale.生物概念向量:在大规模上创建和评估基于文献的生物医学概念嵌入。
PLoS Comput Biol. 2020 Apr 23;16(4):e1007617. doi: 10.1371/journal.pcbi.1007617. eCollection 2020 Apr.
9
Modeling aspects of the language of life through transfer-learning protein sequences.通过转移学习蛋白质序列来模拟生命语言的各个方面。
BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.
10
Learning Contextual Hierarchical Structure of Medical Concepts with Poincairé Embeddings to Clarify Phenotypes.利用庞加莱嵌入学习医学概念的上下文层次结构以阐明表型。
Pac Symp Biocomput. 2019;24:8-17.

本文引用的文献

1
Splitpea: quantifying protein interaction network rewiring changes due to alternative splicing in cancer.豌豆蛋白:定量分析癌症中可变剪接导致的蛋白质相互作用网络重布线变化。
Pac Symp Biocomput. 2024;29:579-593.
2
Joint embedding of biological networks for cross-species functional alignment.生物网络的联合嵌入用于跨物种功能对齐。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad529.
3
Transfer learning enables predictions in network biology.迁移学习可实现网络生物学预测。
Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.
4
SIMBA: single-cell embedding along with features.SIMBA:单细胞特征嵌入。
Nat Methods. 2024 Jun;21(6):1003-1013. doi: 10.1038/s41592-023-01899-8. Epub 2023 May 29.
5
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
6
Accurately modeling biased random walks on weighted networks using node2vec.使用 node2vec 准确建模加权网络上有偏随机游走。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad047.
7
Computational Methods for Single-cell Multi-omics Integration and Alignment.单细胞多组学整合与对齐的计算方法。
Genomics Proteomics Bioinformatics. 2022 Oct;20(5):836-849. doi: 10.1016/j.gpb.2022.11.013. Epub 2022 Dec 26.
8
Gene set proximity analysis: expanding gene set enrichment analysis through learned geometric embeddings, with drug-repurposing applications in COVID-19.基因集临近分析:通过学习的几何嵌入扩展基因集富集分析,在 COVID-19 药物再利用中的应用。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac735.
9
COXPRESdb v8: an animal gene coexpression database navigating from a global view to detailed investigations.COXPRESdb v8:一个从全局视角到详细研究的动物基因共表达数据库。
Nucleic Acids Res. 2023 Jan 6;51(D1):D80-D87. doi: 10.1093/nar/gkac983.
10
Fingolimod ameliorates schizophrenia-like cognitive impairments induced by phencyclidine in male rats.芬戈莫德可改善苯环己哌啶诱导的雄性大鼠精神分裂症样认知障碍。
Br J Pharmacol. 2023 Jan;180(2):161-173. doi: 10.1111/bph.15954. Epub 2022 Oct 5.