Department of Biostatistics, University of Oslo, Oslo, Norway.
Bioinformatics. 2019 May 15;35(10):1625-1633. doi: 10.1093/bioinformatics/bty847.
Visualization of high-dimensional data is an important step in exploratory data analysis and knowledge discovery. However, it is challenging, because the interpretation is highly subjective. If we see dimensionality reduction (DR) techniques as the main tool for data visualization, they are like multiple cameras that look into the data from different perspectives or angles. We can hardly prescribe one single perspective for all datasets and problems. One snapshot of data cannot reveal all the relevant aspects of the data in higher dimensions. The reason is that each of these methods has its own specific strategy, normally based on well-established mathematical theories to obtain a low-dimensional projection of the data, which sometimes is totally different from the others. Therefore, relying only on one single projection can be risky, because it can close our eyes to important parts of the full knowledge space.
We propose the first framework for multi-insight data visualization of multi-omics data. This approach, contrary to single-insight approaches, is able to uncover the majority of data features through multiple insights. The main idea behind the methodology is to combine several DR methods via tensor factorization and group the solutions into an optimal number of clusters (or insights). The experimental evaluation with low-dimensional synthetic data, simulated multi-omics data related to ovarian cancer, as well as real multi-omics data related to breast cancer show the competitive advantage over state-of-the-art methods.
https://folk.uio.no/hadift/MIV/ [user/pass via hadift@medisin. uio.no].
Supplementary data are available at Bioinformatics online.
高维数据的可视化是探索性数据分析和知识发现的重要步骤。然而,这是具有挑战性的,因为解释是高度主观的。如果我们将降维 (DR) 技术视为数据可视化的主要工具,那么它们就像从不同视角或角度观察数据的多个摄像头。我们几乎不能为所有数据集和问题规定一个单一的视角。数据的一个快照无法揭示高维数据的所有相关方面。原因是这些方法中的每一种都有其自己的特定策略,通常基于成熟的数学理论来获得数据的低维投影,而这有时与其他方法完全不同。因此,仅依赖于单一投影可能是有风险的,因为它可能会使我们忽略完整知识空间中的重要部分。
我们提出了用于多组学数据的多视角数据可视化的第一个框架。与单视角方法相反,这种方法能够通过多个视角揭示数据的大部分特征。该方法的主要思想是通过张量分解结合几种 DR 方法,并将解决方案组合成最佳数量的聚类(或视角)。通过低维合成数据、与卵巢癌相关的模拟多组学数据以及与乳腺癌相关的真实多组学数据的实验评估,该方法显示出优于最先进方法的竞争优势。
https://folk.uio.no/hadift/MIV/ [通过 hadift@medisin.uio.no 进行用户/密码访问]。
补充数据可在生物信息学在线获得。