Liefeld Ted, Huang Edwin, Wenzel Alexander T, Yoshimoto Kenneth, Sharma Ashwyn K, Sicklick Jason K, Mesirov Jill P, Reich Michael
University of California San Diego, Department of Medicine, School of Medicine, La Jolla, CA, 92093, USA.
University of California San Diego, Department of Medicine, School Of Medicine, La Jolla, CA, 92093, USA.
bioRxiv. 2023 Jun 27:2023.06.16.545370. doi: 10.1101/2023.06.16.545370.
Non-negative Matrix Factorization (NME) is an algorithm that can reduce high dimensional datasets of tens of thousands of genes to a handful of metagenes which are biologically easier to interpret. Application of NMF on gene expression data has been limited by its computationally intensive nature, which hinders its use on large datasets such as single-cell RNA sequencing (scRNA-seq) count matrices. We have implemented NMF based clustering to run on high performance GPU compute nodes using Cupy, a GPU backed python library, and the Message Passing Interface (MPI). This reduces the computation time by up to three orders of magnitude and makes the NMF Clustering analysis of large RNA-Seq and scRNA-seq datasets practical. We have made the method freely available through the GenePatten gateway, which provides free public access to hundreds of tools for the analysis and visualization of multiple 'omic data types. Its web-based interface gives easy access to these tools and allows the creation of multi-step analysis pipelnes on high performance computing (HPC) culsters that enable reproducible research for non-programmers.
非负矩阵分解(NMF)是一种算法,它可以将数万个基因的高维数据集简化为少数几个元基因,这些元基因在生物学上更易于解释。NMF在基因表达数据上的应用一直受到其计算密集型性质的限制,这阻碍了它在诸如单细胞RNA测序(scRNA-seq)计数矩阵等大型数据集上的使用。我们已经实现了基于NMF的聚类,使用Cupy(一个基于GPU的Python库)和消息传递接口(MPI)在高性能GPU计算节点上运行。这将计算时间减少了多达三个数量级,并使对大型RNA-Seq和scRNA-seq数据集进行NMF聚类分析成为可能。我们已通过GenePatten网关免费提供该方法,该网关为数百种用于多种“组学”数据类型分析和可视化的工具提供免费公共访问。其基于网络的界面使人们能够轻松访问这些工具,并允许在高性能计算(HPC)集群上创建多步骤分析管道,从而使非程序员也能进行可重复的研究。