Department of Biostatistics, University at Buffalo, Buffalo, NY, United States of America.
Biostatistics Division, University of Toronto, Toronto, Ontario, Canada.
PLoS One. 2021 Aug 3;16(8):e0255579. doi: 10.1371/journal.pone.0255579. eCollection 2021.
Multi-omic analyses that integrate many high-dimensional datasets often present significant deficiencies in statistical power and require time consuming computations to execute the analytical methods. We present SuMO-Fil to remedy against these issues which is a pre-processing method for Supervised Multi-Omic Filtering that removes variables or features considered to be irrelevant noise. SuMO-Fil is intended to be performed prior to downstream analyses that detect supervised gene networks in sparse settings. We accomplish this by implementing variable filters based on low similarity across the datasets in conjunction with low similarity with the outcome. This approach can improve accuracy, as well as reduce run times for a variety of computationally expensive downstream analyses. This method has applications in a setting where the downstream analysis may include sparse canonical correlation analysis. Filtering methods specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. The SuMO-Fil method performs favorably by eliminating non-network features while maintaining important biological signal under a variety of different signal settings as compared to popular filtering techniques based on low means or low variances. We show that the speed and accuracy of methods such as supervised sparse canonical correlation are increased after using SuMO-Fil, thus greatly improving the scalability of these approaches.
多组学分析整合了许多高维数据集,通常在统计能力方面存在显著的不足,并且需要耗时的计算来执行分析方法。我们提出了 SuMO-Fil 来解决这些问题,它是一种有监督多组学过滤的预处理方法,可以去除被认为是无关噪声的变量或特征。SuMO-Fil 旨在在下游分析之前执行,这些分析旨在在稀疏环境中检测有监督的基因网络。我们通过实现基于数据集之间低相似度以及与结果之间低相似度的变量过滤器来实现这一点。这种方法可以提高准确性,并减少各种计算成本高昂的下游分析的运行时间。该方法适用于下游分析可能包括稀疏典型相关分析的情况。通过模拟具有已知统计特性的模块化网络,引入并比较了专门用于聚类和网络分析的过滤方法。与基于低均值或低方差的流行过滤技术相比,SuMO-Fil 方法通过消除非网络特征,同时在各种不同的信号设置下保持重要的生物学信号,从而表现出更好的性能。我们表明,在使用 SuMO-Fil 之后,诸如有监督稀疏典型相关之类的方法的速度和准确性得到了提高,从而大大提高了这些方法的可扩展性。