Gliozzo Jessica, Soto-Gomez Mauricio, Guarino Valentina, Bonometti Arturo, Cabri Alberto, Cavalleri Emanuele, Reese Justin, Robinson Peter N, Mesiti Marco, Valentini Giorgio, Casiraghi Elena
AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; European Commission, Joint Research Centre (JRC), Ispra, Italy.
AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy.
Artif Intell Med. 2025 Feb;160:103049. doi: 10.1016/j.artmed.2024.103049. Epub 2024 Dec 11.
Multi-omics data have revolutionized biomedical research by providing a comprehensive understanding of biological systems and the molecular mechanisms of disease development. However, analyzing multi-omics data is challenging due to high dimensionality and limited sample sizes, necessitating proper data-reduction pipelines to ensure reliable analyses. Additionally, its multimodal nature requires effective data-integration pipelines. While several dimensionality reduction and data fusion algorithms have been proposed, crucial aspects are often overlooked. Specifically, the choice of projection space dimension is typically heuristic and uniformly applied across all omics, neglecting the unique high dimension small sample size challenges faced by individual omics. This paper introduces a novel multi-modal dimensionality reduction pipeline tailored to individual views. By leveraging intrinsic dimensionality estimators, we assess the curse-of-dimensionality impact on each view and propose a two-step reduction strategy for significantly affected views, combining feature selection with feature extraction. Compared to traditional uniform reduction pipelines in a crucial and supervised multi-omics analysis setting, our approach shows significant improvement. Additionally, we explore three effective unsupervised multi-omics data fusion methods rooted in the main data fusion strategies to gain insights into their performance under crucial, yet overlooked, settings.
多组学数据通过提供对生物系统和疾病发展分子机制的全面理解,彻底改变了生物医学研究。然而,由于高维度和样本量有限,分析多组学数据具有挑战性,因此需要适当的数据降维流程以确保可靠的分析。此外,其多模态性质需要有效的数据整合流程。虽然已经提出了几种降维和数据融合算法,但关键方面往往被忽视。具体而言,投影空间维度的选择通常是启发式的,并且在所有组学中统一应用,忽略了各个组学面临的独特的高维小样本量挑战。本文介绍了一种针对各个视图量身定制的新型多模态降维流程。通过利用内在维度估计器,我们评估维度诅咒对每个视图的影响,并针对受影响显著的视图提出两步降维策略,将特征选择与特征提取相结合。在关键的监督多组学分析设置中,与传统的统一降维流程相比,我们的方法显示出显著改进。此外,我们探索了三种基于主要数据融合策略的有效的无监督多组学数据融合方法,以深入了解它们在关键但被忽视的设置下的性能。