Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark.
VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae010.
Protein networks are commonly used for understanding how proteins interact. However, they are typically biased by data availability, favoring well-studied proteins with more interactions. To uncover functions of understudied proteins, we must use data that are not affected by this literature bias, such as single-cell RNA-seq and proteomics. Due to data sparseness and redundancy, functional association analysis becomes complex.
To address this, we have developed FAVA (Functional Associations using Variational Autoencoders), which compresses high-dimensional data into a low-dimensional space. FAVA infers networks from high-dimensional omics data with much higher accuracy than existing methods, across a diverse collection of real as well as simulated datasets. FAVA can process large datasets with over 0.5 million conditions and has predicted 4210 interactions between 1039 understudied proteins. Our findings showcase FAVA's capability to offer novel perspectives on protein interactions. FAVA functions within the scverse ecosystem, employing AnnData as its input source.
Source code, documentation, and tutorials for FAVA are accessible on GitHub at https://github.com/mikelkou/fava. FAVA can also be installed and used via pip/PyPI
蛋白质网络常用于理解蛋白质之间的相互作用。然而,它们通常受到数据可用性的影响,偏向于研究较多、相互作用较多的蛋白质。为了揭示研究较少的蛋白质的功能,我们必须使用不受文献偏差影响的数据,如单细胞 RNA-seq 和蛋白质组学。由于数据稀疏和冗余,功能关联分析变得复杂。
为了解决这个问题,我们开发了 FAVA(使用变分自动编码器的功能关联),它将高维数据压缩到低维空间。FAVA 比现有方法从高维组学数据中推断网络的准确性要高得多,涵盖了真实和模拟数据集的多样化集合。FAVA 可以处理超过 0.5 万个条件的大型数据集,并预测了 1039 个研究较少的蛋白质之间的 4210 个相互作用。我们的研究结果展示了 FAVA 提供蛋白质相互作用新视角的能力。FAVA 在 scverse 生态系统中运行,使用 AnnData 作为其输入源。
FAVA 的源代码、文档和教程可在 GitHub 上获得,网址为 https://github.com/mikelkou/fava。FAVA 也可以通过 pip/PyPI