Mattessich Max, Reyna Joaquin, Aron Edel, Ay Ferhat, Kilmer Misha, Kleinstein Steven H, Konstorum Anna
Department of Applied Mathematics, Northwestern University.
Center for Autoimmunity and Inflammation, La Jolla Institute for Immunology.
bioRxiv. 2024 Jun 10:2024.06.07.597819. doi: 10.1101/2024.06.07.597819.
With the increased reliance on multi-omics data for bulk and single cell analyses, the availability of robust approaches to perform unsupervised analysis for clustering, visualization, and feature selection is imperative. Joint dimensionality reduction methods can be applied to multi-omics datasets to derive a global sample embedding analogous to single-omic techniques such as Principal Components Analysis (PCA). Multiple co-inertia analysis (MCIA) is a method for joint dimensionality reduction that maximizes the covariance between block- and global-level embeddings. Current implementations for MCIA are not optimized for large datasets such such as those arising from single cell studies, and lack capabilities with respect to embedding new data.
We introduce nipalsMCIA, an MCIA implementation that solves the objective function using an extension to Non-linear Iterative Partial Least Squares (NIPALS), and shows significant speed-up over earlier implementations that rely on eigendecompositions for single cell multi-omics data. It also removes the dependence on an eigendecomposition for calculating the variance explained, and allows users to perform out-of-sample embedding for new data. nipalsMCIA provides users with a variety of pre-processing and parameter options, as well as ease of functionality for down-stream analysis of single-omic and global-embedding factors.
nipalsMCIA is available as a BioConductor package at https://bioconductor.org/packages/release/bioc/html/nipalsMCIA.html, and includes detailed documentation and application vignettes. Supplementary Materials are available online.
随着在批量和单细胞分析中对多组学数据的依赖增加,拥有强大的无监督分析方法以进行聚类、可视化和特征选择变得至关重要。联合降维方法可应用于多组学数据集,以获得类似于主成分分析(PCA)等单组学技术的全局样本嵌入。多重共同惯性分析(MCIA)是一种联合降维方法,可最大化块级和全局级嵌入之间的协方差。当前MCIA的实现并未针对大型数据集(如单细胞研究产生的数据集)进行优化,并且在嵌入新数据方面缺乏能力。
我们引入了nipalsMCIA,这是一种MCIA实现,它使用非线性迭代偏最小二乘法(NIPALS)的扩展来求解目标函数,并且与早期依赖特征分解的单细胞多组学数据实现相比,显著加快了速度。它还消除了计算解释方差对特征分解的依赖,并允许用户对新数据进行样本外嵌入。nipalsMCIA为用户提供了各种预处理和参数选项,以及对单组学和全局嵌入因子进行下游分析的易用功能。
nipalsMCIA作为一个BioConductor包可在https://bioconductor.org/packages/release/bioc/html/nipalsMCIA.html获取,其中包括详细的文档和应用示例。补充材料可在线获取。