Suppr超能文献

双子星:高效集成数百个基因网络的方法,支持高阶池化。

Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling.

机构信息

Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States.

出版信息

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i504-i512. doi: 10.1093/bioinformatics/btad247.

Abstract

MOTIVATION

The exponential growth of genomic sequencing data has created ever-expanding repositories of gene networks. Unsupervised network integration methods are critical to learn informative representations for each gene, which are later used as features for downstream applications. However, these network integration methods must be scalable to account for the increasing number of networks and robust to an uneven distribution of network types within hundreds of gene networks.

RESULTS

To address these needs, we present Gemini, a novel network integration method that uses memory-efficient high-order pooling to represent and weight each network according to its uniqueness. Gemini then mitigates the uneven network distribution through mixing up existing networks to create many new networks. We find that Gemini leads to more than a 10% improvement in F1 score, 15% improvement in micro-AUPRC, and 63% improvement in macro-AUPRC for human protein function prediction by integrating hundreds of networks from BioGRID, and that Gemini's performance significantly improves when more networks are added to the input network collection, while Mashup and BIONIC embeddings' performance deteriorates. Gemini thereby enables memory-efficient and informative network integration for large gene networks and can be used to massively integrate and analyze networks in other domains.

AVAILABILITY AND IMPLEMENTATION

Gemini can be accessed at: https://github.com/MinxZ/Gemini.

摘要

动机

基因组测序数据的指数级增长已经创建了不断扩展的基因网络存储库。无监督的网络集成方法对于学习每个基因的信息表示至关重要,这些表示后来被用作下游应用的特征。然而,这些网络集成方法必须具有可扩展性,以适应不断增加的网络数量,并且必须具有鲁棒性,以应对数百个基因网络中网络类型分布不均的问题。

结果

为了满足这些需求,我们提出了 Gemini,这是一种新颖的网络集成方法,它使用内存高效的高阶池化来根据每个网络的独特性来表示和加权每个网络。然后,通过混合现有网络来创建许多新网络,从而缓解网络分布不均的问题。我们发现,通过整合来自 BioGRID 的数百个网络,用于人类蛋白质功能预测的 Gemini 在 F1 分数上提高了 10%以上,在 micro-AUPRC 上提高了 15%,在 macro-AUPRC 上提高了 63%,并且随着输入网络集合中添加更多网络,Gemini 的性能显著提高,而 Mashup 和 BIONIC 嵌入的性能则恶化。因此,Gemini 能够实现对大型基因网络的高效内存和信息网络集成,并可用于大规模集成和分析其他领域的网络。

可用性和实现

可以在 https://github.com/MinxZ/Gemini 上访问 Gemini。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1ca9/10311345/81df43f78930/btad247f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验